April 14, 2024 Léopold Mebazaa

Instructions is all you need

When LLMs meet DSLs

AI agents may need custom programming languages to succeed

AI agents promise a future where repetitive tasks that are traditionally performed manually are automated. But, in the present, these agents are slow, clunky, and error-prone. Why is that?

Some believe the current failures of AI agents will disappear as LLMs increase in size and reliability. I suspect that won’t be sufficient: AI agents, as they’re designed today, are conceptually flawed.

The impasse of the “Choose Your Own Adventure” agents

For those familiar, there are two main types of AI agents. I refer to the first type as a “Choose Your Own Adventure” agent. One seminal example would be natbot – which is a CYOA agent that controls the web browser. There have been many successors to natbot, but the core mechanics tend to stay the same.

CYOA agents require the user, at the outset, to provide a series of actions, such as clicking, scrolling, or typing, that the agent will use to accomplish the prompted task. At each step, the agent chooses one of these actions until it succeeds. For example, one could request the CYOA agent to book a flight between San Francisco and Hawaii. The agent would work by selecting one of the predefined actions at every step (in this case, on every page), until the plane ticket is booked and the task is deemed accomplished.

While these agents are quite impressive, they have not proven to be useful. Why not?

First, CYOA agents are inefficient. These agents, in general, take several seconds to decide on the best course of action. Performing the same tasks manually is faster, especially given how error-prone these agents still are.

Moreover, CYOA agents are not designed for repetition. Automating tasks is only useful if one can repeat actions efficiently. Unfortunately, there has been very little development in this area. Currently, execution of the same task requires the user to ask the CYOA the same thing, every time. This is a lengthy and expensive process, even if the agent is successful every time. As it stands, at scale, writing a script is still cheaper and more efficient.

Finally, CYOA agents don’t model complex behavior. They are designed to mimic human behavior, and can thus only execute a sequence of actions. This means they don’t handle things like loops or control flow.

The perils of code-generating agents

The second type of AI agents are those that generate code (e.g. Python) before executing it. Currently, the most ubiquitous code-generating agent is OpenAI’s Code Interpreter. Code Interpreter is generally used for math and data analysis, but it can do many other tasks.

Code-generating agents are theoretically more powerful than CYOA agents because they are not limited to performing a series of predefined actions. But these agents also present significant problems.

First, code-generating agents do not have enough formal guardrails. Code that is written and executed by AI with little human supervision presents the possibility of a security disaster. While OpenAI’s Code Interpreter operates within the confines of some guardrails, other code-generating agents might not do so.

Guardrails around AI agents are essential for more than simply preventing a rogue AI. For example, companies that wish to automate their customer service systems will want to ensure that these agents comply with all of the company’s internal policies. Code-generating agents will therefore not be widely adopted without strong guarantees of formal guardrails. Agents that freestyle in Python simply cannot promise these guarantees.

Secondly, code-generating agents, like CYOA agents, are inefficient. Take the example of web browsing. Code that is intended to scrape and crawl websites usually consists of hundreds of lines or more. Large parts of that code are boilerplate and unoptimized, not to mention unsafe. And because the first pass of code is often rife with error, the process of correcting it through trial and error becomes long and costly. Using Code Interpreter through OpenAIs API currently costs five cents per session.

Thirdly, bigger and better code-generating models might in fact create new problems. As AI code-generating agents advance, we may reach a point where these agents produce so much code that manual review becomes all but impossible. This problem will exist even if formal guardrails are implemented, and even if these LLMs are able to produce safe, optimized code on the cheap. Traditional programming languages might not be the right tool for agents to code in, as they will produce code that is too long for human review.

Why custom programming languages might be the missing piece

I think that AI agents might be improved with the introduction of domain-specific programming languages (DSLs). More specifically, there should be a collection of domain-focused DSLs that can be used for whatever agent is tasked.

An agent that would generate code in a DSL would synthesize both types of agents in the following way:

Is the agent fast and efficient?
- CYOA: No, because the agent can only perform one action at a time.
- Code generating: No, because code is not functional, full of errors, too long, and too expensive and time consuming to fix.
- DSL: Yes, because it would iterate on concise, domain-specific code.
Is the agent able to model complex behavior?
- CYOA: No (again, the agent can only perform one action at a time).
- Code generating: Yes, because it can generate arbitrary code.
- DSL: Yes, because a DSL is code-like (i.e. has control flow)
Does the agent have formal guardrails?
- CYOA: Yes, because only predefined actions are executed.
- Code generating: No (or at least, not in any sufficient capacity)
- DSL: Yes, because the design of the DSL can be controlled.
Is the agent designed for repeatability?
- CYOA: No, as they require repeated prompting every time.
- Code generating: Yes, as it generates code that can be executed later.
- DSL: Yes, as it generates code that can be executed later.
Does the agent generate code that is easy to review?
- CYOA: Yes, as long as you consider the actions to be code.
- Code generating: No, because the code could be too long/verbose/specialized.
- DSL: Yes, because the code would probably be easier to read and debug.

Over the past decades, as the need to understand the internal workings of the computer has declined, computer programming has become far more abstract. We went from zeros and ones to assembly to C to Python. Many settled on the latter because it is a good choice for manually written code that is reviewable by others. As we enter a new era with AI, we might need to reach a new level: DSLs for AI-written code that is reviewable by the rest of us. And I’m working for Columbia University until this fall to build precisely that.

If you want to chat – My email is lemeb ‘at’ this domain.

Thanks to Natasha Esponda, Fred Kjolstad, Alex Ozdemir, Eric Pennington, and Michael Völske for their feedback.