<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Autonomous Agents on ToolGenix — Open-Source AI &amp; Developer Tools: Honest Hands-On Reviews</title><link>https://toolgenix.nxtniche.com/tags/autonomous-agents/</link><description>Recent content in Autonomous Agents on ToolGenix — Open-Source AI &amp; Developer Tools: Honest Hands-On Reviews</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Fri, 03 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://toolgenix.nxtniche.com/tags/autonomous-agents/index.xml" rel="self" type="application/rss+xml"/><item><title>Muteki: The Open-Source Multi-Agent Swarm That Won CTF Competitions</title><link>https://toolgenix.nxtniche.com/posts/evening-2026-07-03-muteki-quick-review/</link><pubDate>Fri, 03 Jul 2026 00:00:00 +0000</pubDate><guid>https://toolgenix.nxtniche.com/posts/evening-2026-07-03-muteki-quick-review/</guid><description>Muteki is an open-source multi-model AI agent swarm that coordinates Claude Code, Codex, and Cursor as a single team — with proven CTF results like 200/200 on the NYU benchmark.</description><content:encoded><![CDATA[<p>Ever watched a single AI agent spin in circles on a complex problem? I have. Claude Code will happily burn 20 minutes exploring a dead end with no way to pull itself out. That&rsquo;s the exact problem Muteki was built to solve.</p>
<p><strong>Muteki</strong> (無敵, &ldquo;Invincible&rdquo;) is an open-source multi-agent swarm — 217★ on GitHub and growing fast — that orchestrates Claude Code, Codex, and Cursor as a coordinated team. So instead of one agent talking to itself, it dispatches different agents to different sub-problems, shares findings through a shared blackboard, and keeps the whole operation moving toward the goal. And it&rsquo;s not theoretical — it placed 8th at RIFFHACK 2026 with zero human intervention and scored 200/200 on the NYU CTF benchmark.</p>
<h2 id="what-makes-this-multi-agent-swarm-different">What Makes This Multi-Agent Swarm Different</h2>
<p>Most &ldquo;multi-agent&rdquo; frameworks are abstract — they chain LLM calls with prompts. But Muteki is a different beast. It shells out to actual CLI coding agents and coordinates them through a scheduling architecture with four distinct phases:</p>
<ul>
<li><strong>Prepare</strong> — builds the blackboard, stages attachments, health-checks engines</li>
<li><strong>Recon Race</strong> — multiple agents scan the whole challenge in parallel for breadth-first recon</li>
<li><strong>Coordination Loop</strong> — agents claim tasks, report results, re-plan every ~2 seconds</li>
<li><strong>Wind-down</strong> — persists findings, emits reports, cleans up worker scratch dirs</li>
</ul>
<p>The key innovation is the shared blackboard — a SQLite database that all agents read and write through a <code>muteki-blackboard</code> skill. Facts accumulate, dead ends are never retried, and a flag is only accepted when it appears verbatim in live execution output. That&rsquo;s heterogeneity plus shared evidence under one roof.</p>
<p>I tested this on a retired web challenge just to see the coordination in action. Within 30 seconds of hitting &ldquo;Run,&rdquo; three agents were working in parallel — one digging through source code, another scanning endpoints, a third writing exploit logic. Watching them coordinate on that shared blackboard feels like watching a real team, not a single model trying to do everything at once.</p>
<h2 id="real-results-not-hype">Real Results, Not Hype</h2>
<p>Still, the README is refreshingly light on marketing fluff. Here&rsquo;s what actually matters:</p>
<ul>
<li><strong>RIFFHACK 2026</strong>: 8th place, fully autonomous for 3 hours, zero human takeover. AK&rsquo;d every challenge.</li>
<li><strong>Blackmaze range</strong>: A pentest range with zero solves for three months. Muteki speed-ran first blood in 2 hours.</li>
<li><strong>HackTheBox</strong>: AK&rsquo;d Insane and Hard difficulty categories across the board.</li>
<li><strong>NYU CTF Benchmark</strong>: <strong>200/200 — 100% solve rate</strong> across 6 major categories (CSAW 2017–2023). 36/36 hard/expert challenges solved. Median solve time ~2–4 minutes. Cumulative API cost: ~$214 for 370M tokens across all 200 challenges.</li>
</ul>
<p>Those aren&rsquo;t cherry-picked single runs. They&rsquo;re the output of a month of engineering optimization, and they&rsquo;re reproducible because the project is fully open-source under AGPL-3.0.</p>
<h2 id="quick-start-running-the-swarm">Quick Start: Running the Swarm</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>git clone https://github.com/FishCodeTech/muteki.git
</span></span><span style="display:flex;"><span>cd muteki
</span></span><span style="display:flex;"><span>./init.sh
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Set at least one API key in .env</span>
</span></span><span style="display:flex;"><span>./run.sh web
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Visit http://localhost:3001</span>
</span></span></code></pre></div><p>That&rsquo;s it. The whole setup took me about 30 seconds from clone to web UI. The web UI gives you a command deck where you configure engines, upload challenge files, and hit run. Everything&rsquo;s documented in the README — including a thorough security warning about never running this on your main workstation.</p>
<h2 id="what-to-watch-out-for">What to Watch Out For</h2>
<p>Now, Muteki is purpose-built for CTF and security challenges. So unless you&rsquo;re actively competing or running benchmarks, its day-to-day usefulness is limited. The AGPL-3.0 license means commercial integration needs careful handling. And the full setup — Docker, multiple API keys, engineered worker images — is serious overkill if all you want is a single coding assistant.</p>
<p>The project also explicitly warns: run it in an isolated environment. It drives agents that execute commands and reach target services. This isn&rsquo;t something you install on your daily driver.</p>
<h2 id="bottom-line-proof-that-multi-agent-swarms-work">Bottom Line: Proof That Multi-Agent Swarms Work</h2>
<p>Honestly, Muteki is the most impressive open-source demonstration of multi-agent orchestration I&rsquo;ve seen this year. It skips the hype and ships benchmarks, a clean architecture, and reproducible results. If you&rsquo;re building agent systems or just want to see what a real multi-agent swarm looks like in action, this repo is worth your time. It pairs nicely with today&rsquo;s look at <a href="/posts/junction-vscode-panel-7-local-ai-agents-2026/">Junction&rsquo;s agent UI</a> and <a href="/posts/lemma-open-source-workspace-review-2026/">Lemma&rsquo;s agent workspace</a> — together they cover the full agent tooling triptych.</p>
]]></content:encoded></item></channel></rss>