85% Less Cost. 9/10 Quality. Same Model.
UpCommander is an AI coding orchestrator validated across 52+ controlled benchmark runs. One structured brief before the AI starts — CONTRACT.md — accounts for 54% of the total cost reduction.
The Stacked Numbers
Same model (Sonnet 4.6). Same codebase. Each row adds one change on top of the previous. All results measured — not projected.
| Approach | Cost | Quality |
|---|---|---|
| Naive baseline (no optimization) | $5.45 | 5/10 |
| + CONTRACT.md brief | $2.51 | 9/10 |
| + AST index compression | $0.85 | 9/10 |
| + Lightweight orchestrator | $0.83 | 9/10 |
Measured on a production Next.js/TypeScript/Supabase codebase. Sonnet 4.6. Raw data in /evaluations on GitHub.
How It Works
Contract → worker → done. No retry loops. No review passes. One clean pass with the right context.
Generate a CONTRACT.md
Before any worker touches code, generate a structured brief: exact TypeScript interfaces, column names, import paths, SQL conventions, and explicit non-goals. The AI stops exploring and starts executing.
npx upcommander contract "add pagination to notes API"
# Generates CONTRACT.md with:
# - Exact interfaces and types
# - DB column names + SQL conventions
# - Import paths (no guessing)
# - Explicit non-goals
# Takes ~30 seconds, costs ~$0.15Run the worker
The K2 worker (Sonnet + CONTRACT + compressed codebase index) executes your task. No retry loops — one clean pass with full context. Typical task: 2–8 minutes.
npx upcommander run "add pagination to notes API"
# Streams output in real time
# Uses L1 codebase index (8K tokens)
# 91% compression vs raw file reads
# Cost: ~$0.08–0.30 depending on task sizeReview the benchmark data
52+ controlled runs across V1/V2/NS/V2O strategies, multi-model comparison, and cross-vendor grading. All raw JSON in the repo. Replicate it on your own codebase.
# 4 conditions, same task, same model
# V1 (no contract): $5.45, 5/10 quality
# V2 (CONTRACT): $2.51, 9/10 quality
# + AST compress: $0.85, 9/10 quality
# + Haiku orch: $0.83, 9/10 quality
#
# -85% cost. Quality 5/10 → 9/10.What the Benchmarks Found
52+ runs across 4 strategies, 3 models, and 3 independent graders. Each finding validated at N≥5 before it went in.
CONTRACT.md beats everything else
A structured brief before the task — exact interfaces, column names, import paths, non-goals — reduces cost 54% and raises quality from 5/10 to 9/10. Same model. Just better context.
AST compression: 91% fewer tokens
Instead of sending raw source files, extract only exported symbols and types using AST parsing. 85–91% token reduction on a real production codebase. Zero quality tradeoff.
Haiku for boilerplate workers
Haiku + CONTRACT.md = 9.0/10 at 64% of Sonnet's cost. Scaffolding does most of the work. Route cheap models to boilerplate, Sonnet to design decisions.
Lightweight orchestrator
Replace expensive stateful orchestration with a Haiku model reading/writing a compact state file. -96% orchestration cost. Adds only 8% to total end-to-end cost.
What Doesn't Work
These three approaches sound reasonable. The benchmarks show they make things worse.
Agent Teams
Every agent loads the full codebase context independently. Three agents = three copies of your 80K-token context. Quality doesn't improve — cost multiplies. Empirically proven across N=5 runs.
Retry loops
When a model retries, it regenerates entire files instead of making surgical edits. Fixing a broken import path means rewriting the whole route file — and losing the correct CRUD endpoints. Tested across 15 retry attempts.
Opus review pass (V2O)
When the CONTRACT is well-formed, Sonnet already hits 9.8/10. A clean N=5 retest with full Opus one-shot review: same quality, +56% cost. Write a better brief instead.
Get Started in 60 Seconds
Requires Node 18+, an Anthropic API key, and a codebase you want to work on. The CLI is a standalone package — no build step, no account.
# Install
npm install -g @upgpt/upcommander-cli
# Set your Anthropic key
export ANTHROPIC_API_KEY=sk-ant-...
# Generate a contract + run the worker
upcommander run "add pagination to the notes API"
# Or: generate the contract first, review it, then run
upcommander contract "add pagination to the notes API"
upcommander run "add pagination to the notes API"
Open Source, BYOK, No Telemetry
The full orchestration stack — CONTRACT generation, AST index compression, model routing, multi-provider API client (Anthropic, OpenAI, Google, OpenRouter), and benchmark runner — is MIT licensed. Bring your own API keys. All benchmark data is in /evaluations so you can replicate every number.
Want This Running on Your Stack?
UpGPT designs, builds, and runs agentic AI solutions for businesses — applying UpCommander's scaffolding, multi-model routing, and compression stack from day one. You get the optimized state without the three weeks of benchmark iteration.
Talk to Us About Your StackNo commitment. We'll tell you whether UpCommander is right for your use case.