Rolph Recto – Text is Not the Universal Interface

I read “Text is the Universal Interface” by roon in the early stages of the current language model boom (it was published a month or so before the release of ChatGPT) and I thought it was an interesting way of describing why LLMs are so useful: they are an evolution of the Unix philosophy of developing tools with text-based interfaces. Reading it again now, however, I realized that it conflates between two different notions of what “text” is, and this conflation obscures a fundamental difference between Unix tools and language models.

A lot of times “text-based” really means text as a data format, where a tool can process its input and output as a sequence of tokens, and, importantly, this sequence is readable by humans. The output of classic Unix tools is text-based in this way: run ls in your terminal, and you can read its output as a list of file and directory names. You can compose Unix tools through pipes, where you feed the output of one tool to the input of another. Tools don’t need to agree on a specific binary format; it’s just text! You can read it, other tools can process it. Everything’s easy. In “UNIX Time-Sharing System: Foreword”, Douglas McIlroy puts the philosophy of composable text tools in this way:

Besides composability, in “always bet on text” Graydon Hoare outlines more advantages of text as a data format:

So if the claim that “text is the universal interface” is strictly about text as a data format, I have no quarrel. Non-text formats have their place, but for consuming, transforming, and storing information, text is king.¹

But there’s a second notion of “text” at work in roon’s original article, where “text” really means free-form natural language. It is this notion of text that makes language models “text-based” in an interesting way. A tool having a natural language interface implies that its input and output is in text format, so language models enjoy the same benefits that other tools based on text formats enjoy: composability, searchability, indexability, and so on. But language models have natural language interfaces and Unix tools don’t.

In fact, almost nothing in computing has a natural language interface. Of course there have been chatbots before LLMs, like Apple’s Siri and Amazon’s Alexa, but they’ve been limited in scope. LLMs are not like this. In roon’s telling, LLMs are a kind of Swiss army knife, where you can solve any business or engineering task so long as you can formulate the problem and its solution in natural language:

This is the reading of “text is the universal interface” that is interesting, and indeed it has captured the minds (and money) of a lot of people. The idea that you can automate most white-collar (read: computer-facing) work away is as amazing as it is terrifying. Not only that, a natural language interface makes it much easier to hook up LLMs to the world outside the computer. Just think about the enormous amounts of Slack messages, email, internal wikis, Word documents that constitute institutional memory in your work. There are other ways of communicating information, of course—there’s Zoom meetings, afterwork chats in the bar—but I don’t think it’s too much of a stretch to say that “text is the universal interface” within organizations.

The example of LLMs for coding is instructive. Here we have a task that is at its core text-based (regardless of whatever fancy IDE you use, you are still editing text), where the tool of choice—a programming language—has a sophisticated yet precise interface that requires some amount of expertise to use. The idea is tantalizing: instead of getting bogged down in the minute details of writing code, you can just describe what the program is supposed to do, and the model writes the code for you. LLMs for coding promise to democratize software development, allowing anyone to be “vibe coders” without much trouble. And even though there’s some rough edges—large projects don’t fit in context windows, the model hallucinates APIs, the generated code contains subtle bugs and security vulnerabilities—they’re already pretty useful. Besides, “this is the worst the models will be,” as people say, because they’ll only get better. It’s not hard to imagine a future where software engineers will be a distant memory from a bygone era, like switchboard operators or lamplighters.

Perhaps, perhaps.² But we’re not there yet, and it behooves us to take stock of the pitfalls of using LLMs as they currently exist. To that I turn to, of all people, Edsger Dijkstra. In an address to the ACM in 1984, he describes two “alchemical myths” of programming that purport to relieve us of the burdens of programming.

The first myth is the idea that we just need a programming language with the right kind of features to solve all our programming problems. Dijkstra analogizes this to the Philosopher’s Stone, a mythical substance that transformed base metals like lead into precious metals like gold. The promise of the “Programmer’s Stone” is that it would transform the “base metals” of our current practice of software development into the “precious metals” of an enlightened practice that allows us to develop programs with complex requirements on time, on budget, and without bugs.

The second myth Dijkstra describes is the idea that instead of programming languages, in the future we would use “no-code” tools that did not require expertise. He analogizes this to the Elixir of Life, a substance that supposedly conferred immortality. The promise of the programmer’s Elixir of Life is not to transform programming but rather to eliminate it altogether, much like how the alchemists’ Elixir eliminated death.

You can imagine what Dijkstra thinks of these two alchemical myths of programming. I cannot find a better description of the pitfalls of LLMs than his criticism of programming elixirs:

This is essentially Andrej Karpathy’s notion of “jagged intelligence” avant la lettre. The reason language models are so compelling is that their natural language interface creates “an illusion of power”: you can formulate a dizzying array of problems in natural language, and because LLMs can take these problem descriptions as input you expect that they can provide solutions. But they can’t solve all of these problems—and worse, it’s not even clear when they can’t, since LLMs will happily hallucinate a wrong solution.

To really understand why Dijkstra is pessimistic about these alchemical myths, we need to pull back a bit. In his address, Dijkstra sums up the promise and the challenge of computing in this beautiful passage:

Computers hold the promise that they can implement “any thinkable mechanism,” without any physical modification. That’s a big promise—anything we can think of, we can implement! But, of course, the rub is in the implementing. We can make the computer do anything, so long as we can write a program to make the computer do what we intend it to do. Because we can make the computer do anything, however, these programs can be arbitrarily complicated. Nothing is stopping us from making our programs a mess.

The fundamental challenge of programming, then, is to manage complexity. Almost like a physical law, software complexity inevitably increases over time. To resist the rising tide of complexity, we must have the “austere intellectual discipline,” as Dijkstra puts it, of “keeping things sufficiently simple.” This discipline requires from us careful thought about the way we design and implement software. There is no easy way out.

LLMs undoubtedly have a place in helping us manage the complexity of software. Ask it questions about some library documentation or gnarly code that someone else wrote; ask it about particulars about the semantics of a programming language you’re not quite used to; have it write boring scripts you don’t really want to write. But they are no programming elixirs.

You can probably generalize Dijkstra’s criticism to any domain we might want to use LLMs. The world is a very complicated place, and to navigate it requires careful thought. We do not have to do this thinking unaided by tools, but the tools will not replace the thinking. I mean, let’s be clear here: people are going to try to replace the thinking, but it’s not going to go well.

To end, let us go back to roon’s original point, that LLMs are evolution of the classic text-based Unix tools. Not only does this conflate between text as a data format and text as natural language, the jaggedness of LLM intelligence also renders this comparison rather unfortunate. A language model is a monolithic tool that can seemingly do anything, yet can really only do some of those things well. This violates perhaps the most important maxim of Unix philosophy. Again, from Douglas McIlroy:

Of course, not everything in computing needs to accord with Unix philosophy, however much we look up to it as a guiding light for designing and implementing software. But I think that framing LLMs as an evolution of Unix tools give them an air of time-tested design that they perhaps do not deserve. We still use grep and diff, more than fifty years since the development of Unix. LLMs, amazing and full of potential as they are, have yet to prove out their longevity.

Since Graydon Hoare is most famous for developing Rust, I tend to interpret his post as a commentary about the futility of “visual” programming languages. I don’t disagree with his general prescription, but visual PLs has filled a niche in areas like education and music.↩︎
I am personally skeptical of claims that we can ride “scaling laws” all the way up to AGI. In the “free lunch” era of computing, before the breakdown of Dennard scaling, you could just wait for new hardware and your programs will run faster. Now you have to make your programs concurrent to squeeze as much performance as you can from modern hardware. It seems like we’re still in the “free lunch” era of AI—you can just wait for better models and their outputs on the same prompts would be better. So it seems that scaling laws still hold, most recently thanks to “test-time compute.” But perhaps there’s an AI version of the breakdown of Dennard scaling in the offing. Don’t ask me when we’re going to hit “the wall” though.↩︎