umadev Review: I Tested This Open-Source AI Project Director

Disclosure: I may earn a commission if you sign up through links on this page. This review is based on my own testing — no sponsor influence.

You know the drill. You tell Claude Code to “build a todo app with Postgres,” it cranks out code in 90 seconds, and you think you’re done. Then you look at what it actually built — mismatched API paths, hardcoded colors, placeholder images, and TODOs scattered through every file. It says “done” but it’s not done.

And I’ve been there more times than I can count. So when I found umadev — an open-source Rust project that claims to turn AI coding agents into a structured delivery team — I had to try it. 122 stars in its first week on GitHub. That’s not a lot by absolute numbers, but for a tool this niche? It tells me developers feel the same pain.

Here’s the short version: umadev doesn’t replace your AI coding CLI. It wraps around it — Claude Code, Codex, or OpenCode — and runs the show like a real project director. Think of it as the producer who yells “cut” when the scene isn’t right, not the actor on stage.

What Is umadev?

So umadev is a single Rust binary (MIT license, v1.0.7) that you install via npm. Yes, npm for a Rust binary — it’s a distribution shim, not a Node app. Under the hood it’s pure Rust, cross-compiled for macOS, Linux, and Windows.

But the core idea is dead simple: you describe what you want in plain language, and umadev orchestrates your base coding agent through a structured delivery pipeline. The base does the coding. umadev does the directing.

But it runs a 9-role review team that sounds excessive on paper but makes sense in practice:

Role	What They Do
Director	Owns the plan, drives the main session, aggregates all verdicts
Product Manager	Checks scope, acceptance criteria, PRD completeness
Architect	Reviews data model, APIs, scalability decisions
UI/UX Designer	Enforces design tokens, typography, component states
Frontend Engineer	Writes UI code and handles frontend reviews
Backend Engineer	Writes server code and handles backend reviews
QA Engineer	Checks test coverage, edge cases, runtime behavior
Security	Runs SAST, secret scanning, auth pattern review
DevOps	Validates Dockerfile, CI config, deployment setup

Still, the key design choice here is that these roles never chat to each other. They coordinate exclusively through shared artifact files and structured verdicts. No infinite agent-slash-agent conversation loops. That alone makes me trust the architecture more than most “multi-agent” systems I’ve tested.

Quick Start: Here’s What Happened When I Ran It

So I installed umadev on my Ryzen 9 workstation running Ubuntu 24.04, with Claude Code already logged in:

npm install -g umadev

And the npm install finished in about 12 seconds. Behind the scenes it pulled the Rust binary and a ~224 MB local embedding model (multilingual-e5-small) for offline vector search. No Docker, no Python deps, no API keys to configure.

Then I ran:

umadev

First launch shows a clean TUI — markdown rendering, syntax-highlighted code, the works. It asked me to pick a backend from three options: Claude Code, Codex, or OpenCode. I picked Claude Code. That was it. No config files to edit.

So I gave it a test prompt:

build a todo app with a Postgres backend

Here’s what surprised me: But it didn’t start coding immediately. Instead it showed an intent card — “full build, entering the delivery flow” — then spent roughly 40 seconds planning. It drafted three documents before a line of code was written: a PRD, an architecture doc, and a UI/UX spec. Each showed up in output/ as markdown files. It paused for my review at the docs_confirm gate.

But that pause is important. Still, most AI coding tools rush straight to implementation. umadev forces a human checkpoint before code, which is exactly what I’d do with a junior developer on my team.

After I approved the docs, it built an execution plan with dependency ordering, wrote the frontend (React + TypeScript), paused again for a live preview, then wrote the backend, ran a full quality gate, and handed me a delivery pack with a scorecard and proof archive.

Still, the whole thing took about 14 minutes for a basic todo app. Not fast compared to raw Claude Code output, but the output was actually shippable — no fake data, no mismatched routes, no random emoji icons.

What Makes It Different From Raw Claude Code

Most people use Claude Code like a smart autocomplete — type a prompt, get code, fix issues one by one until it works. umadev enforces a completely different workflow:

Dimension	Raw Claude Code	umadev + Claude Code
Planning	None — starts coding immediately	PRD, architecture, UI/UX docs first
Quality	Self-assessment (“looks good to me”)	Deterministic gate: build, test, lint, contract check
Governance	None	~112 rules: no emoji icons, no leaked secrets, no AI-slop patterns
Frontend↔Backend	Manual validation	Mechanical contract check via `umadev-contract`
Delivery	“Done” (they say)	Scorecard, proof pack, compliance mapping
Learning	Stateless each session	Self-evolving memory — records mistakes, reflects on recurrence

But here’s the thing — and I want to be honest about this — umadev is slower for simple tasks. If you need a quick 20-line bash script or a one-file utility, raw Claude Code is faster. The overhead of planning, reviewing, and gating only pays off when you’re building something with actual structure.

It’s also worth comparing umadev to other tools I’ve covered on ToolGenix:

CowAgent is a 24/7 AI butler — great for always-on chat and automation, but it’s not a software delivery engine. CowAgent answers questions and runs tools. umadev ships projects.
MetaHarness generates agent scaffolds — the “create-react-app for agent frameworks” — but it stops at the skeleton. umadev takes you from skeleton to shipped product with governance and proof.

The umadev Quality Gate: This Is Where It Shines

Now the quality gate is umadev’s killer feature, and here’s why I think so.

After the base finishes writing code, umadev runs an independent check: build, test, lint, typecheck, contract validation, runtime probe. It doesn’t ask the model “is this good?” — it checks the actual artifacts. The runtime probe starts the app and hits its routes, writing runtime-proof.json as evidence that the app actually responds.

But in my test, it caught one thing I wouldn’t have noticed: the frontend was calling /api/v1/todos but the backend route was defined as /api/v1/items. umadev’s contract checker found the mismatch and flagged it as a blocking finding. That’s exactly the kind of bug that costs me 20 minutes of debugging in a normal workflow.

Now governance checks run on every file write — ~112 rules covering UI quality (no emoji-as-icons), security (no leaked API keys), architecture (no hardcoded colors), and language-specific patterns. Every rule is configurable in .umadev/rules.toml. And importantly, they’re fail-open — a bug in the governor never blocks your work.

What Could Be Better in umadev

But I’m not going to pretend this is perfect. Even so, here are the rough edges I hit:

Cold start is slow. The first response takes 30–60 seconds because it pre-loads the firmware (system prompt, knowledge base, repo map) into the base. After that, subsequent turns are fast, but that initial wait is noticeable.
Heavy for small tasks. A one-line bug fix doesn’t need a PRD and a 9-role review. umadev does scale down — the router handles this — but in practice I found the planning overhead still adds friction for trivial changes.
The embedding model is a 224 MB download. Optional, and it degrades gracefully to BM25-only, but on a metered connection that’s a big chunk of data for a “quick install.”
Still early. 122 stars, first release June 19, 2026. The spec is solid and the architecture is Rust-grounded, but the ecosystem (community plugins, integrations, docs) is minimal. You’re an early adopter if you jump on this now.

Who Should Use This

If you…	umadev is worth a try
Build full-stack apps with AI coding agents	✅ Especially if you’ve hit the “it says done but it’s not” wall
Manage a small team evaluating AI-assisted delivery	✅ The scorecard and proof pack make handoffs auditable
Want governance over AI-generated code	✅ 112 rules you can configure per-project
Need a quick script or one-file utility	❌ Overkill — stick with raw Claude Code or Codex
Deploy agent-driven workflows in production	✅ This is exactly the use case

The Bottom Line

umadev doesn’t try to replace how you write code with AI. Instead it solves a different problem — one that becomes painfully obvious the moment you try to ship something real: AI coding agents are great at writing code and terrible at managing a project.

It costs nothing (MIT license, open source). It works with the tools you already use (Claude Code, Codex, OpenCode). And while it’s early — 122 stars, less than a week old at the time of writing — the engineering is grounded in a real spec, a real Rust codebase, and real delivery artifacts.

I’d like to see what this project looks like in six months. But right now, if you’re building anything beyond a one-file prototype with AI agents, umadev is worth your Sunday afternoon.

Want to run umadev as a 24/7 team service? (affiliate link) You’ll need a VPS. A $6/month DigitalOcean Droplet is plenty for the Rust binary plus a small Postgres instance. Or if you prefer multi-region coverage, Vultr starts at $2.50/month. Either way, the project director runs all day without your laptop.

Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.

Vultr — starts at $2.50/mo
DigitalOcean — $200 credit for new users

What Is umadev?#

Quick Start: Here’s What Happened When I Ran It#

What Makes It Different From Raw Claude Code#

The umadev Quality Gate: This Is Where It Shines#

What Could Be Better in umadev#

Who Should Use This#

The Bottom Line#