Muteki: The Open-Source Multi-Agent Swarm That Won CTF Competitions

Ever watched a single AI agent spin in circles on a complex problem? I have. Claude Code will happily burn 20 minutes exploring a dead end with no way to pull itself out. That’s the exact problem Muteki was built to solve.

Muteki (無敵, “Invincible”) is an open-source multi-agent swarm — 217★ on GitHub and growing fast — that orchestrates Claude Code, Codex, and Cursor as a coordinated team. So instead of one agent talking to itself, it dispatches different agents to different sub-problems, shares findings through a shared blackboard, and keeps the whole operation moving toward the goal. And it’s not theoretical — it placed 8th at RIFFHACK 2026 with zero human intervention and scored 200/200 on the NYU CTF benchmark.

What Makes This Multi-Agent Swarm Different

Most “multi-agent” frameworks are abstract — they chain LLM calls with prompts. But Muteki is a different beast. It shells out to actual CLI coding agents and coordinates them through a scheduling architecture with four distinct phases:

Prepare — builds the blackboard, stages attachments, health-checks engines
Recon Race — multiple agents scan the whole challenge in parallel for breadth-first recon
Coordination Loop — agents claim tasks, report results, re-plan every ~2 seconds
Wind-down — persists findings, emits reports, cleans up worker scratch dirs

The key innovation is the shared blackboard — a SQLite database that all agents read and write through a muteki-blackboard skill. Facts accumulate, dead ends are never retried, and a flag is only accepted when it appears verbatim in live execution output. That’s heterogeneity plus shared evidence under one roof.

I tested this on a retired web challenge just to see the coordination in action. Within 30 seconds of hitting “Run,” three agents were working in parallel — one digging through source code, another scanning endpoints, a third writing exploit logic. Watching them coordinate on that shared blackboard feels like watching a real team, not a single model trying to do everything at once.

Real Results, Not Hype

Still, the README is refreshingly light on marketing fluff. Here’s what actually matters:

RIFFHACK 2026: 8th place, fully autonomous for 3 hours, zero human takeover. AK’d every challenge.
Blackmaze range: A pentest range with zero solves for three months. Muteki speed-ran first blood in 2 hours.
HackTheBox: AK’d Insane and Hard difficulty categories across the board.
NYU CTF Benchmark: 200/200 — 100% solve rate across 6 major categories (CSAW 2017–2023). 36/36 hard/expert challenges solved. Median solve time ~2–4 minutes. Cumulative API cost: ~$214 for 370M tokens across all 200 challenges.

Those aren’t cherry-picked single runs. They’re the output of a month of engineering optimization, and they’re reproducible because the project is fully open-source under AGPL-3.0.

Quick Start: Running the Swarm

git clone https://github.com/FishCodeTech/muteki.git
cd muteki
./init.sh
# Set at least one API key in .env
./run.sh web
# Visit http://localhost:3001

That’s it. The whole setup took me about 30 seconds from clone to web UI. The web UI gives you a command deck where you configure engines, upload challenge files, and hit run. Everything’s documented in the README — including a thorough security warning about never running this on your main workstation.

What to Watch Out For

Now, Muteki is purpose-built for CTF and security challenges. So unless you’re actively competing or running benchmarks, its day-to-day usefulness is limited. The AGPL-3.0 license means commercial integration needs careful handling. And the full setup — Docker, multiple API keys, engineered worker images — is serious overkill if all you want is a single coding assistant.

The project also explicitly warns: run it in an isolated environment. It drives agents that execute commands and reach target services. This isn’t something you install on your daily driver.

Bottom Line: Proof That Multi-Agent Swarms Work

Honestly, Muteki is the most impressive open-source demonstration of multi-agent orchestration I’ve seen this year. It skips the hype and ships benchmarks, a clean architecture, and reproducible results. If you’re building agent systems or just want to see what a real multi-agent swarm looks like in action, this repo is worth your time. It pairs nicely with today’s look at Junction’s agent UI and Lemma’s agent workspace — together they cover the full agent tooling triptych.

What Makes This Multi-Agent Swarm Different#

Real Results, Not Hype#

Quick Start: Running the Swarm#

What to Watch Out For#

Bottom Line: Proof That Multi-Agent Swarms Work#

What Makes This Multi-Agent Swarm Different

Real Results, Not Hype

Quick Start: Running the Swarm

What to Watch Out For

Bottom Line: Proof That Multi-Agent Swarms Work