Ever looked at your codebase knowing there’s dead code, stale imports, and logic holes — but assigning a senior dev to audit it costs $200/hr and using a cheap LLM to “just look at everything” returns AI-fluffed noise? Yeah, me too. That’s the exact gap shadcn/improve aims to fill.

So the idea is deceptively simple: let your most expensive, most capable LLM do the hard thinking — audit, root-cause analysis, architecture calls — then hand its plan to a cheap model to execute. Different costs, different roles, one npx pipeline. And it’s from the same person who brought us shadcn/ui — already sitting at 4,900 stars in under a week.

I tested it on a Django project I maintain (~45,000 lines, four years of accumulated tech debt). Here’s what happened.

Two-Model Architecture: How It Works

Three commands, zero config:

  1. npx shadcn-improve audit — scans your codebase, sends context to the strongest model your API key unlocks (GPT-4o, Claude Opus 4, Gemini 2.5 Pro — whatever $LLM_API_KEY points to)
  2. Review the plan — the strong model outputs a structured diff of what to change and why
  3. npx shadcn-improve execute — a cheaper model (GPT-4o-mini, Claude Haiku, Gemini 2.5 Flash) works through the plan line by line

Here’s the cost breakdown from my test:

Phase Model Used Input Tokens Cost
Audit (full codebase scan) GPT-4o ~85K ~$0.26
Execute (apply 12 planned changes) GPT-4o-mini ~14K ~$0.01
Total pipeline ~99K ~$0.27

So twenty-seven cents to audit a 45K-line project and get actionable changes back. Compared to a senior dev’s hourly rate, that’s not even a rounding error.

What the shadcn/improve Audit Actually Found

But the output surprised me. The audit surfaced three real issues I hadn’t caught in months of regular maintenance:

  • Two orphaned Celery tasks — imported but never registered in CELERY_BEAT_SCHEDULE. Dead code that’d been sitting there silently since a refactor last year.
  • A transaction atomicity bugtransaction.atomic() decorator on a function that also made external API calls inside the block. If the API call timed out, the DB would roll back changes the caller already consumed. That one made me wince.
  • Four stale imports — nothing critical, but the kind of visual noise that slows down onboarding.

And the audit report was structured with exact file paths, line numbers, and a one-sentence explanation per finding. No fluff. No hallucinated issues.

But the execute phase? Not perfect. The cheap model applied the import fixes correctly but messed up the Celery task removal — it deleted the task definition without updating the import reference. Still, easy git checkout fix — but it confirms: human diff review is still non-negotiable.

Where It Fits the AI Agent Toolchain

shadcn/improve sits in a neat spot alongside two other tools I’ve covered here:

Tool Philosophy Best For
Agent Skills Composable skill templates Building production agent workflows
Ponytail YAGNI minimalism Reducing code surface area
shadcn/improve Two-model cost optimization Code audit & structured refactoring

Still, none of these overlap — they form a toolchain. Ponytail says “write less code,” agent-skills says “reuse patterns,” improve says “spend the right amount on the right thinking.” And if you’re watching API costs (who isn’t?), Headroom rounds out the toolchain by compressing context before it hits the token counter. If I were designing an AI-assisted engineering pipeline from scratch today, I’d use all four.

What to Watch Out For

But it’s CLI-only at launch. No IDE plugin, no GitHub Action template, no dashboard. If you want nightly automated audits, you’re wiring that CI job yourself.

Non-deterministic plans. So the strong model writes a different plan each run. If you’re in a regulated environment that needs reproducible audit trails, you’ll want to version-control the plan before executing.

The execute phase is brittle. Cheap models follow instructions reasonably well, but they miss context-dependent changes (my Celery issue is a perfect example). Budget time for diff review.

And it’s a fast-moving project. 4,900 stars in a week from shadcn’s reputation alone. So expect breaking changes, API shifts, and rapid iterations. Not a set-and-forget tool yet.

shadcn/improve: Bottom Line

shadcn/improve is the first tool I’ve seen that genuinely optimizes the cost structure of AI-assisted code work rather than just wrapping an API call. And the two-model architecture is clever, the audit quality surprised me, and $0.27 for a full codebase scan is hard to argue with. Yet it’s not ready for CI automation — you’ll want eyes on every diff — but as a weekly code review companion? And absolutely worth the npx.

Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.