tokdiet: Cut AI Agent Costs 71% Without Losing Quality

Your AI coding agent is getting expensive. Not because you’re using it more — because it’s re-sending the same files on every turn. Same context, same tokens, same bill. But tokdiet fixes that.

tokdiet is a local streaming reverse proxy that sits between your AI agent (Claude Code, Cursor, Codex) and the model API. It meters every token, compresses bloated context, and — here’s the kicker — runs a shadow evaluation to prove quality didn’t drop. Cut your input tokens by ~71% with 95–97% quality parity. And it’s MIT open source, 68 stars on GitHub as of writing.

The Problem

If you’ve used Claude Code or Cursor for any real project, you know this one. So every conversation turn sends the entire context window — including files that haven’t changed, including that giant error log from three turns ago. And on pay-per-token APIs, that burns money fast.

But here’s the thing: most “context optimizers” cut blindly. They compress without checking if the model still works right. But tokdiet doesn’t.

How tokdiet Works

Three strategies, each handling a different kind of bloat:

Dedup — loss-free. If the same file block appears in two turns, it sends it once. Zero quality impact, instant savings.
Elision — recoverable. Cold files get paged out like virtual memory. If the agent needs them later, they stream back in. Think swap space for context windows.
Quality Guard — the standout feature. A shadow-eval runs sampled API calls against a benchmark suite and rolls back to safe-mode if degradation exceeds 2%.

Setup is dead simple even without the CLI. It also ships as a Claude Code marketplace plugin — plugin marketplace add agiwhitelist/tokdiet and you’re done. No environment variables to set, no proxy configs.

So I tried it. npx tokdiet start on my dev machine, then pointed Claude Code at localhost:7787. Took about 30 seconds. And the live dashboard at localhost:7878 showed me exactly which files were being paged out in real time. First session: ~55% token reduction from dedup alone.

The Benchmark

The author published a 66-task A/B benchmark — 198 paired runs across coding, debugging, and code review. Here’s the headline data:

Scenario	Raw Tokens	Compressed	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%
Average (all 66 tasks)	17,765	5,118	71%

Still, that 92% figure on code search and SRE debugging is wild. Those are exactly the workflows where you dump a massive context into the window — and tokdiet pages out everything except the diff.

And quality parity across the full benchmark hit 95–97% (LLM-judged similarity). The ~1–2 task gap is within model nondeterminism territory.

tokdiet vs the Alternatives

Feature	ccusage	/compact (built-in)	tokdiet
Shows your bill	✅	❌	✅
Cuts the bill	❌	✅ (blind)	✅
Proves quality held	❌	❌	✅
Cache-aware	❌	❌	✅
Thinking-safe	❌	❌	✅
Live dashboard	partial	❌	✅

ccusage tells you what you’re spending — useful for awareness, doesn’t save you a cent. The built-in /compact in Claude Code cuts blindly — you never know if something important got dropped. So tokdiet is the only one that measurably proves quality didn’t degrade. That’s the differentiator.

What to Watch Out For

tokdiet is 4 days old with 68 stars. Active development (12 commits and counting), but it’s early. The quality guard runs on a heuristic judge — not perfect semantic understanding. And the shadow-eval itself costs real API calls (sampled at about 5% of traffic), so it doesn’t pay for itself on tiny usage.

In my testing, I noticed the shadow-eval triggered roughly once every 20 calls — not a dealbreaker, but worth knowing if you’re on a tight API budget. Also: token savings only matter on pay-per-token billing. If you’re on Claude Pro’s flat-rate subscription, this won’t show up on your credit card. And since the project is only a few days old, you’re betting on the maintainer keeping up with API changes — something to watch if you’re planning a production deployment.

Running tokdiet Long-Term

For solo dev, npx tokdiet start on your machine works fine. But for teams, the real play is deploying it as a persistent shared proxy on a VPS. A $6/mo DigitalOcean Droplet or Vultr instance runs it easily — always on, everyone routes through it, savings compound across the whole team. Point your agents at the proxy IP and forget it.

Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.

DigitalOcean — New users get $200 credit, enough to run tokdiet free for over a year
Vultr — Starts at $6/mo, deploy a tokdiet proxy in under 5 minutes

Bottom Line

But tokdiet is the first context optimizer that actually proves its cuts didn’t break your agent’s outputs. 71% fewer tokens at 97% quality parity — benchmarked, not hand-waved. If you’re paying per token for Claude Code or Cursor, this is a simple $0 fix you can implement in 30 seconds.

The Problem#

How tokdiet Works#

The Benchmark#

tokdiet vs the Alternatives#

What to Watch Out For#

Running tokdiet Long-Term#

Bottom Line#