Open Source · 52+ Benchmark Runs · MIT

85% Less Cost. 9/10 Quality. Same Model.

UpCommander is an AI coding orchestrator validated across 52+ controlled benchmark runs. One structured brief before the AI starts — CONTRACT.md — accounts for 54% of the total cost reduction.

View on GitHub Talk to Us About Your Stack

52+controlled runs

-85%cost reduction

9/10quality score

MITopen source

The Stacked Numbers

Same model (Sonnet 4.6). Same codebase. Each row adds one change on top of the previous. All results measured — not projected.

Approach	Cost	Quality	Savings
Naive baseline (no optimization)	$5.45	5/10	—
+ CONTRACT.md brief	$2.51	9/10	-54%
+ AST index compression	$0.85	9/10	-84%
+ Lightweight orchestrator	$0.83	9/10	-85%

Measured on a production Next.js/TypeScript/Supabase codebase. Sonnet 4.6. Raw data in /evaluations on GitHub.

How It Works

Contract → worker → done. No retry loops. No review passes. One clean pass with the right context.

Generate a CONTRACT.md

Before any worker touches code, generate a structured brief: exact TypeScript interfaces, column names, import paths, SQL conventions, and explicit non-goals. The AI stops exploring and starts executing.

npx upcommander contract "add pagination to notes API"

# Generates CONTRACT.md with:
# - Exact interfaces and types
# - DB column names + SQL conventions
# - Import paths (no guessing)
# - Explicit non-goals
# Takes ~30 seconds, costs ~$0.15

Run the worker

The K2 worker (Sonnet + CONTRACT + compressed codebase index) executes your task. No retry loops — one clean pass with full context. Typical task: 2–8 minutes.

npx upcommander run "add pagination to notes API"

# Streams output in real time
# Uses L1 codebase index (8K tokens)
# 91% compression vs raw file reads
# Cost: ~$0.08–0.30 depending on task size

Review the benchmark data

52+ controlled runs across V1/V2/NS/V2O strategies, multi-model comparison, and cross-vendor grading. All raw JSON in the repo. Replicate it on your own codebase.

# 4 conditions, same task, same model
# V1 (no contract): $5.45, 5/10 quality
# V2 (CONTRACT):    $2.51, 9/10 quality
# + AST compress:   $0.85, 9/10 quality
# + Haiku orch:     $0.83, 9/10 quality
#
# -85% cost. Quality 5/10 → 9/10.

What the Benchmarks Found

52+ runs across 4 strategies, 3 models, and 3 independent graders. Each finding validated at N≥5 before it went in.

-54% cost

CONTRACT.md beats everything else

A structured brief before the task — exact interfaces, column names, import paths, non-goals — reduces cost 54% and raises quality from 5/10 to 9/10. Same model. Just better context.

-91% tokens

AST compression: 91% fewer tokens

Instead of sending raw source files, extract only exported symbols and types using AST parsing. 85–91% token reduction on a real production codebase. Zero quality tradeoff.

-64% cost

Haiku for boilerplate workers

Haiku + CONTRACT.md = 9.0/10 at 64% of Sonnet's cost. Scaffolding does most of the work. Route cheap models to boilerplate, Sonnet to design decisions.

-96% orch

Lightweight orchestrator

Replace expensive stateful orchestration with a Haiku model reading/writing a compact state file. -96% orchestration cost. Adds only 8% to total end-to-end cost.

What Doesn't Work

These three approaches sound reasonable. The benchmarks show they make things worse.

+73–124% cost

Agent Teams

Every agent loads the full codebase context independently. Three agents = three copies of your 80K-token context. Quality doesn't improve — cost multiplies. Empirically proven across N=5 runs.

9/10 → 6/10 quality

Retry loops

When a model retries, it regenerates entire files instead of making surgical edits. Fixing a broken import path means rewriting the whole route file — and losing the correct CRUD endpoints. Tested across 15 retry attempts.

+56% cost, 0 gain

Opus review pass (V2O)

When the CONTRACT is well-formed, Sonnet already hits 9.8/10. A clean N=5 retest with full Opus one-shot review: same quality, +56% cost. Write a better brief instead.

Get Started in 60 Seconds

Requires Node 18+, an Anthropic API key, and a codebase you want to work on. The CLI is a standalone package — no build step, no account.

# Install

npm install -g @upgpt/upcommander-cli

# Set your Anthropic key

export ANTHROPIC_API_KEY=sk-ant-...

# Generate a contract + run the worker

upcommander run "add pagination to the notes API"

# Or: generate the contract first, review it, then run

upcommander contract "add pagination to the notes API"

upcommander run "add pagination to the notes API"

Open Source, BYOK, No Telemetry

The full orchestration stack — CONTRACT generation, AST index compression, model routing, multi-provider API client (Anthropic, OpenAI, Google, OpenRouter), and benchmark runner — is MIT licensed. Bring your own API keys. All benchmark data is in /evaluations so you can replicate every number.

MIT LicenseTypeScriptNode 18+BYOKMulti-provider

Star on GitHub npm install @upgpt/upcommander-cli Read the Full Benchmark Story

Want This Running on Your Stack?

UpGPT designs, builds, and runs agentic AI solutions for businesses — applying UpCommander's scaffolding, multi-model routing, and compression stack from day one. You get the optimized state without the three weeks of benchmark iteration.

Talk to Us About Your Stack

No commitment. We'll tell you whether UpCommander is right for your use case.