Designing intelligence into products.

I build agent-native systems — workflow runtimes, LLM serving pipelines, and the evals that keep them honest — and write about the seam between a model and a product.

read the writing→selected work

listening

›ask the model

01Selected work

LLM COMPILER

Forge: intent → typed LLM functions

Decision

Bet that reliability comes from STRUCTURE, not model intelligence — constrained generation, atomic decomposition, a validated IR spine, type contracts.

Outcome

Small models (down to 8B-class) reliably write workflows that look like they need a frontier model — far fewer tokens, far higher pass rate.

WORKFLOW RUNTIME

The graph is read-only

Decision

Made the visual graph a read-only projection of typed functions — parsed from the AST — instead of a thing you edit both ways.

Outcome

Killed the 20-year round-trip drift by construction; the model authors typed code, the compiler is the gate.

RETRIEVAL

A retriever for agents, not eyes

Decision

Built recall + cross-encoder rerank for an agent calling it in a loop — a summary plus structured pages, not a search box.

Outcome

Recall gains proven with paired significance tests on a public set — real, not a vibe.

0 → 1

Voice-first coding on AR glasses

Decision

Two-layer agent: an OS-level brain for intent + a safety countdown, an editor-level executor for the edits — driven by voice from AR glasses.

Outcome

Hands-free, glanceable multi-step engineering, away from a keyboard.

02Writing

Prompts teach patterns, not facts

The single move that mattered most across every version of my workflow engine — drag the hard-coded facts out of the prompt and let them arrive at runtime, so the prompt only ever teaches shape.

2026-06-022 min→

The graph is read-only

I rebuilt a workflow engine around one rule — code is the single source of truth, the visual graph is a projection of it. Twenty years of round-trip tools say that's the only direction that holds.

2026-06-012 min→

Two agents, one workflow

Turning natural language into an executable workflow with one agent is a slog. I split it into a fast agent that scopes and a heavy one that builds — and the win came from giving each its own clean context.

2026-05-302 min→

The Seam

Everything I write circles one problem — turning a probability distribution into something a person can trust. This is the throughline. Start here.

2026-05-281 min→

Evals are the product

Most AI products don't fail on the model. They fail on the absence of a way to know whether a change made things better. The eval harness — not the model — is the asset you own.

2026-05-272 min→

A retriever for agents, not eyes

A search box is built for a human who scans one page of results. A retriever for an agent is a function it calls in a loop. That one difference changes the whole pipeline — and how you prove it got better.

2026-05-262 min→

An LLM backend that survives concurrency

A backend that "survives concurrency" is making a tail-latency claim, not an average one. Here is the staged pipeline I run behind an AI answer box, and the three boring guardrails that keep the slow tail from eating everyone.

2026-05-222 min→

all essays →

03What I work on

Reliability from structure

Constrained generation, atomic decomposition, a validated IR. Make a small model reliable by building the reliability into the structure — not by buying it with a bigger model.

Agent-native runtimes

Code as the source of truth — typed workflows, read-only graph projections, designers that author skills. Built for models as the primary author.

Tests as the net

Paired significance tests, probe suites, honest baselines — the net that confirms an intuition, not the gate that replaces it. Range over the whole stack, from TTS to retrieval to LLMs.