<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Improve on ToolGenix — AI Tools Discovery &amp; Reviews</title>
    <link>https://toolgenix.nxtniche.com/tags/improve/</link>
    <description>Recent content in Improve on ToolGenix — AI Tools Discovery &amp; Reviews</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 16 Jun 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://toolgenix.nxtniche.com/tags/improve/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>shadcn/improve: Two-Model AI Code Audit in 2026 (Quick Look)</title>
      <link>https://toolgenix.nxtniche.com/posts/shadcn-improve-quick-review-2026-06-16/</link>
      <pubDate>Tue, 16 Jun 2026 00:00:00 +0000</pubDate>
      <guid>https://toolgenix.nxtniche.com/posts/shadcn-improve-quick-review-2026-06-16/</guid>
      <description>shadcn/improve is shadcn&amp;#39;s new CLI that uses two LLMs — one to plan architecture changes, one to execute them. I ran it on a 45K-line Django project for $0.27.</description>
      <content:encoded><![CDATA[<p>Ever looked at your codebase knowing there&rsquo;s dead code, stale imports, and logic holes — but assigning a senior dev to audit it costs $200/hr and using a cheap LLM to &ldquo;just look at everything&rdquo; returns AI-fluffed noise? Yeah, me too. That&rsquo;s the exact gap shadcn/improve aims to fill.</p>
<p>So the idea is deceptively simple: let your most expensive, most capable LLM do the hard thinking — audit, root-cause analysis, architecture calls — then hand its plan to a cheap model to execute. Different costs, different roles, one <code>npx</code> pipeline. And it&rsquo;s from the same person who brought us shadcn/ui — already sitting at <strong>4,900 stars</strong> in under a week.</p>
<p>I tested it on a Django project I maintain (~45,000 lines, four years of accumulated tech debt). Here&rsquo;s what happened.</p>
<h2 id="two-model-architecture-how-it-works">Two-Model Architecture: How It Works</h2>
<p>Three commands, zero config:</p>
<ol>
<li><strong><code>npx shadcn-improve audit</code></strong> — scans your codebase, sends context to the strongest model your API key unlocks (GPT-4o, Claude Opus 4, Gemini 2.5 Pro — whatever <code>$LLM_API_KEY</code> points to)</li>
<li><strong>Review the plan</strong> — the strong model outputs a structured diff of what to change and why</li>
<li><strong><code>npx shadcn-improve execute</code></strong> — a cheaper model (GPT-4o-mini, Claude Haiku, Gemini 2.5 Flash) works through the plan line by line</li>
</ol>
<p>Here&rsquo;s the cost breakdown from my test:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Phase</th>
					<th style="text-align: center">Model Used</th>
					<th style="text-align: center">Input Tokens</th>
					<th style="text-align: center">Cost</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">Audit (full codebase scan)</td>
					<td style="text-align: center">GPT-4o</td>
					<td style="text-align: center">~85K</td>
					<td style="text-align: center">~$0.26</td>
			</tr>
			<tr>
					<td style="text-align: left">Execute (apply 12 planned changes)</td>
					<td style="text-align: center">GPT-4o-mini</td>
					<td style="text-align: center">~14K</td>
					<td style="text-align: center">~$0.01</td>
			</tr>
			<tr>
					<td style="text-align: left"><strong>Total pipeline</strong></td>
					<td style="text-align: center">—</td>
					<td style="text-align: center"><strong>~99K</strong></td>
					<td style="text-align: center"><strong>~$0.27</strong></td>
			</tr>
	</tbody>
</table>
<p>So twenty-seven cents to audit a 45K-line project and get actionable changes back. Compared to a senior dev&rsquo;s hourly rate, that&rsquo;s not even a rounding error.</p>
<h2 id="what-the-shadcnimprove-audit-actually-found">What the shadcn/improve Audit Actually Found</h2>
<p>But the output surprised me. The audit surfaced three real issues I hadn&rsquo;t caught in months of regular maintenance:</p>
<ul>
<li><strong>Two orphaned Celery tasks</strong> — imported but never registered in <code>CELERY_BEAT_SCHEDULE</code>. Dead code that&rsquo;d been sitting there silently since a refactor last year.</li>
<li><strong>A transaction atomicity bug</strong> — <code>transaction.atomic()</code> decorator on a function that also made external API calls inside the block. If the API call timed out, the DB would roll back changes the caller already consumed. That one made me wince.</li>
<li><strong>Four stale imports</strong> — nothing critical, but the kind of visual noise that slows down onboarding.</li>
</ul>
<p>And the audit report was structured with exact file paths, line numbers, and a one-sentence explanation per finding. No fluff. No hallucinated issues.</p>
<p>But the execute phase? Not perfect. The cheap model applied the import fixes correctly but messed up the Celery task removal — it deleted the task definition without updating the import reference. Still, easy <code>git checkout</code> fix — but it confirms: human diff review is still non-negotiable.</p>
<h2 id="where-it-fits-the-ai-agent-toolchain">Where It Fits the AI Agent Toolchain</h2>
<p>shadcn/improve sits in a neat spot alongside two other tools I&rsquo;ve covered here:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Tool</th>
					<th style="text-align: center">Philosophy</th>
					<th style="text-align: center">Best For</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left"><a href="/posts/agent-skills-quick-review-2026-06-11/">Agent Skills</a></td>
					<td style="text-align: center">Composable skill templates</td>
					<td style="text-align: center">Building production agent workflows</td>
			</tr>
			<tr>
					<td style="text-align: left"><a href="/posts/ponytail-quick-review-2026-06-13/">Ponytail</a></td>
					<td style="text-align: center">YAGNI minimalism</td>
					<td style="text-align: center">Reducing code surface area</td>
			</tr>
			<tr>
					<td style="text-align: left"><strong>shadcn/improve</strong></td>
					<td style="text-align: center">Two-model cost optimization</td>
					<td style="text-align: center">Code audit &amp; structured refactoring</td>
			</tr>
	</tbody>
</table>
<p>Still, none of these overlap — they form a toolchain. Ponytail says &ldquo;write less code,&rdquo; agent-skills says &ldquo;reuse patterns,&rdquo; improve says &ldquo;spend the right amount on the right thinking.&rdquo; And if you&rsquo;re watching API costs (who isn&rsquo;t?), <a href="/posts/headroom-quick-review-2026/">Headroom</a> rounds out the toolchain by compressing context before it hits the token counter. If I were designing an AI-assisted engineering pipeline from scratch today, I&rsquo;d use all four.</p>
<h2 id="what-to-watch-out-for">What to Watch Out For</h2>
<p><strong>But it&rsquo;s CLI-only at launch.</strong> No IDE plugin, no GitHub Action template, no dashboard. If you want nightly automated audits, you&rsquo;re wiring that CI job yourself.</p>
<p><strong>Non-deterministic plans.</strong> So the strong model writes a different plan each run. If you&rsquo;re in a regulated environment that needs reproducible audit trails, you&rsquo;ll want to version-control the plan before executing.</p>
<p><strong>The execute phase is brittle.</strong> Cheap models follow instructions reasonably well, but they miss context-dependent changes (my Celery issue is a perfect example). Budget time for diff review.</p>
<p><strong>And it&rsquo;s a fast-moving project.</strong> 4,900 stars in a week from shadcn&rsquo;s reputation alone. So expect breaking changes, API shifts, and rapid iterations. Not a set-and-forget tool yet.</p>
<h2 id="shadcnimprove-bottom-line">shadcn/improve: Bottom Line</h2>
<p>shadcn/improve is the first tool I&rsquo;ve seen that genuinely optimizes the <em>cost structure</em> of AI-assisted code work rather than just wrapping an API call. And the two-model architecture is clever, the audit quality surprised me, and $0.27 for a full codebase scan is hard to argue with. Yet it&rsquo;s not ready for CI automation — you&rsquo;ll want eyes on every diff — but as a weekly code review companion? And absolutely worth the <code>npx</code>.</p>
<!-- BEGIN AFFILIATE LINKS (generated by ads-center) -->
<div class="affiliate-block">
  <p><em>Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.</em></p>
  <ul>
    <li><a href="https://toolgenix.nxtniche.com/go/amazon/1835462316" rel="nofollow sponsored noopener" target="_blank">Building LLM Powered Applications</a> — by Pramod Alto, practical guide to creating intelligent apps and agents with Large Language Models</li>
  </ul>
</div>
<!-- END AFFILIATE LINKS -->
]]></content:encoded>
    </item>
  </channel>
</rss>
