live2026-05-30·2 min read

Two agents, one workflow

Turning natural language into an executable workflow with one agent is a slog. I split it into a fast agent that scopes and a heavy one that builds — and the win came from giving each its own clean context.

The job sounds like one task: take a sentence — "when a crypto price crosses X, message me on Slack" — and emit a workflow that actually runs. So you reach for one agent in a loop. It works on the demo and falls apart on anything real: the context bloats, the function-calling loop spins, the JSON fragments.

The shape that holds is two agents. A chat agent that's fast and cheap: it scopes the request, asks the one clarifying question, and emits a compact spec. A designer agent that's heavier: it takes that spec and authors the workflow, planning the steps up front instead of consulting a model after every action.

chat · scope
designer · build
natural-language request → intent + one clarifying question
← compact spec (the distilled hand-off)
plan all steps up front → author typed workflow
← runnable workflow + trace
Fig 1 — the handoff. A compact spec crosses between two isolated contexts; neither agent carries the other's clutter.

It's tempting to read this as "smaller prompts." That's not the win. The win is independent context windows. Anthropic's own multi-agent numbers are blunt about it: a lead + sub-agents beat a single strong agent by ~90%, and the improvement tracked token budget and the ability to spread reasoning across separate contexts — not cleverness. The scoping agent never has to see the build's mess; the builder never re-derives the intent.

~90%lead+subagents over single agent (Anthropic)
~15×the token tax a multi-agent system pays

You state that tax out loud, because it's real. Splitting only pays on work that parallelizes; for tightly-coupled tasks a single fast model still wins. And the failure I spent the most time on has a precise root cause worth knowing:

There's a topology argument hiding here too. Flat swarms of peer agents amplify errors — one analysis put a "bag of agents" at ~17× error inflation versus ~4× for a hierarchy with a gatekeeper. Two agents in a scope→build shape isn't minimalism for its own sake. It's the smallest topology that puts one agent in charge of the other's output, which is the thing that stops errors from echoing.

The rest is plumbing that earns the split back: a schema cache, parallel pre-warm so hot tools aren't cold, history compression near the context limit, and token telemetry on every call so cost is a number you watch, not a surprise you discover.