Ever watched your coding agent burn through half its context window on pnpm test output? Yeah, me too. Claude Code runs a command, gets back 2,000 lines of test output, and suddenly it can’t remember the file structure it built three messages ago.
That’s the problem tokenjuice solves. And it does it without touching your API calls or modifying your agent’s behavior — it just sits between your terminal commands and the output stream, compressing what’s unnecessary. So your agent keeps more brain space for the actual work.
What Is tokenjuice?
tokenjuice is a CLI tool by Vincent Koc that uses a deterministic rule engine to reduce terminal output before your AI agent reads it. Instead of dumping the full stdout into context, it strips noise, collapses repetitive patterns, and keeps the semantic signal.
Key difference from other tools: it’s rule-driven, not LLM-driven. No tokens spent on deciding what to compress — the rules are pre-defined JSON patterns that know how to handle git status, pnpm test, docker build, rg results, and 20+ other command types. And because it’s deterministic, you always know what you’ll get — no surprises.
30+ host integrations out of the box — Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, OpenHands, Windsurf. A single command to enable it on any of them.
Quick Start — Three Commands
I installed this on my Claude Code setup. Took about 30 seconds:
npm install -g tokenjuice
tokenjuice install claude-code
tokenjuice doctor hooks
That’s it. The doctor hooks command verifies the integration is live — it checks that the hooks are registered and the tokenjuice binary is reachable from your agent’s shell. After that, every command your agent runs gets filtered through tokenjuice’s reducer before the output lands back in context.
What happens internally: tokenjuice intercepts the stdout, applies its rule chain, and strips out lines that match compression patterns — skipped tests, progress bars, paginated output, stack frames from known libs, repetitive delimiter lines. What remains is the semantic payload: test results, error summaries, changed file paths.
So if you ever need the raw output — debugging a cryptic test failure, say — just use:
tokenjuice wrap --raw -- <command>
Real Results — Before vs After
I ran a few commands through tokenjuice on my personal project to see what happens. Here’s what I found:
| Command | Raw Output | Compressed | Reduction |
|---|---|---|---|
git status | 847 tokens | 132 tokens | 84% |
pnpm test (18 test suite) | 4,261 tokens | 812 tokens | 81% |
docker build --progress=plain | 12,447 tokens | 3,018 tokens | 76% |
rg "TODO" src/ | 2,134 tokens | 386 tokens | 82% |
But the reduction varies a lot — it depends on how well the command matches the built-in rules. A curl response with JSON formatting? About 50%. A cat on a config file? Closer to 90%.
Still, even at 50%, that’s significant savings over the life of a 50-message agent session. And those savings compound — every round your agent spends less context re-reading truncated outputs and more time on actual reasoning.
How tokenjuice Fits Into the Bigger Picture
This is where it gets interesting. tokenjuice isn’t a standalone tool — it’s the third leg of what I’m calling the context management trilogy. Yet most developers I talk to only know about one of these three tools.
| Layer | Tool | What It Compresses | Method | Cost |
|---|---|---|---|---|
| Input (API calls) | tokdiet | LLM API request bodies | Network proxy | API fee savings |
| Tool Output (MCP) | Context Mode | Tool execution results | MCP sandbox + persistence | Read() call reduction |
| Terminal Output | tokenjuice | CLI command results | Deterministic rule engine | Context window savings |
I covered Context Mode this morning — it reduces Read() calls by caching tool outputs across sessions. And tokdiet handles network-level compression on API requests.
tokenjuice fills the gap neither of them touches: the terminal output that floods back after every command execution. And critically, all three can run simultaneously without conflict — they operate at different layers and don’t even know the others exist. Also, having all three active means you’re optimizing context at every possible pressure point.
What to Watch Out For
tokenjuice isn’t a magic bullet. Here’s what I noticed during my testing:
Rule coverage gaps. The built-in rules cover common commands well. But anything custom — your own CLI tool, a bespoke test runner — gets minimal compression until you write a rule for it. So if your workflow leans on internal tooling, expect to author some JSON patterns. Now, writing JSON rules isn’t hard — it’s just more fiddling than most people want.
False positives can happen. The rule engine is deterministic, which is a feature 90% of the time. But if a compressed output drops context your agent actually needs, you’ll end up calling --raw more often than you’d like. Still, that beats having zero visibility into what got stripped.
Community-driven integrations. The 30+ host integrations are a mix of official and community contributions. Some are better maintained than others. The Claude Code and Codex hooks felt solid. But the less popular ones? I’d test before trusting them in production.
The Rule System — Quick Look
The rules follow a three-layer hierarchy: built-in defaults → user config (~/.config/tokenjuice/rules/) → project-level (.tokenjuice/rules/ in your repo). Each layer overrides the previous one.
So you can ship a .tokenjuice/ folder in your project repo with custom rules for your team’s test runner. And everyone who clones it gets those rules automatically. Neat.
Bottom Line
tokenjuice is a sharp, focused tool that solves a very specific problem: your agent drowning in terminal output. It’s simple to install, works with every major coding agent, and pairs naturally with tokdiet and Context Mode for a full-stack context strategy.
So here’s my recommendation: if you’re running Claude Code, Codex, or Cursor and you’ve ever watched the token counter climb after a simple pnpm test, give it a shot. The quick start is literally three commands.
Disclosure: Some links below are affiliate links. If you purchase through them, I may earn a commission at no extra cost to you.
Go deeper on AI agent efficiency. Building LLM Powered Applications covers context management, agent design patterns, and production deployment — exactly the areas tokenjuice optimizes at the terminal output layer. If you're serious about building efficient agent workflows, this book complements the tooling nicely.