/ now
What I'm building, reading, and changing my mind about.
Forge — a compiler for LLM functions
Compiling plain language into typed, runnable workflows that SMALL models can get right — by keeping the model on rails (tight constraints, small pieces, type checks) instead of leaning on raw smarts. The grand version of everything before it.
This notebook
Deep, illustrated essays on agent-native systems — workflow runtimes, retrieval for agents, and the tests that catch what intuition misses. Pattern over fact.
Reliability at the tail
Eval calibration, judge bias, paired significance testing — the failure modes that never show up in the demo and always show up in production.
AI product & systems work
0→1 agent-native products and the eval/quality systems behind them. Based in Shenzhen, happy to work remote.
A great demo predicts a great product.
It predicts a great happy path. Production is the tail.
More autonomy is the goal.
A legible, steerable copilot beats an opaque autopilot almost every time.
A bigger model fixes quality.
Most of my wins came from error analysis and routing, not a model swap.