PilotDeck Review: Open-Source Agent OS with Smart Memory

Sat, 13 Jun 2026 14:00:00 +0800

A few weeks ago I had three different AI projects running at the same time — a codebase refactor, a technical spec review, and some competitive analysis. Sounds manageable, right? It wasn’t. And every time I switched between projects in Claude Code, the context was gone. The agent mixed up code from the refactor with questions about the spec. So I spent more time re-explaining context than actually getting work done.

That’s the problem PilotDeck sets out to fix. And it does it better than anything else I’ve tried.

PilotDeck is an open-source “agent operating system” built by Tsinghua’s OpenBMB lab — the same team behind the BMB ecosystem. It hit 3,223 stars on GitHub in its first two weeks, with 339 forks and active daily commits. Not bad for a project that’s only been public since late May.

The short version: PilotDeck solves multi-project context pollution with three real innovations — WorkSpace-level isolation, white-box memory you can read and edit, and smart model routing that cuts token costs by up to 70% in certain scenarios. Plus it runs 24/7 in the background on your own server. If you manage more than one AI project at a time, this is worth your attention.

But let me back up and explain why this matters — because the problem is bigger than most people realize.

Why Your Current AI Agent Setup Is Broken

Look, if you’ve used Claude Code or Cursor for more than a week, you’ve hit this wall: every session starts fresh. The agent has no memory of what you discussed yesterday, last hour, or even five minutes ago in a different window.

And that’s by design. Sure, these tools are built for single-session coding tasks. They’re optimized for “write this function” or “refactor this file” — not for running multiple projects in parallel. But the way developers actually work is messy. You have a production bug on project A, a feature branch on project B, and research notes on project C. And your AI agent treats each of them like a stranger.

The result? Context pollution, token waste from repeatedly re-explaining project details, and the annoying feeling that your agent has the memory of a goldfish.

I’ve been there. More times than I’d like to admit.

What Makes PilotDeck Different

PilotDeck treats each project as its own WorkSpace — a fully isolated environment with its own file system context, memory store, model configuration, and task queue. Think of it like Docker containers for your AI agents, but with a lot more intelligence baked in.

WorkSpace Isolation

This is the foundation. Each WorkSpace gets its own sandboxed environment. Switch from your code refactor project to your research paper digest — the agent doesn’t mix them up because the contexts literally don’t overlap.

So I tested this by creating two WorkSpaces side by side. In WorkSpace A, I dropped a Python codebase and asked the agent to document the API surface. In WorkSpace B, I pasted a research paper about transformer architectures and asked for a summary. Result: clean separation. The agent in WorkSpace A never once referenced transformers, and WorkSpace B never mentioned Python functions.

Sounds simple, but none of the existing tools do this properly.

White-Box Memory

But this was the feature that got me excited. PilotDeck’s memory isn’t a black box — you can actually see what the agent remembers, edit it, and even delete specific entries.

Here’s what I mean: in Claude Code, when the agent references something from earlier in the conversation, you have no idea what it’s actually pulling from context. It’s a black box. But in PilotDeck, the memory store is presented as a searchable, editable list. Every key-value pair, every context reference — visible and modifiable.

So I ran a test where I intentionally fed the agent wrong information about a config file path, then opened the memory panel, found the incorrect entry, and deleted it. And the agent picked up the correction on the next task. That level of control is rare in the agent space right now.

Smart Model Routing

So this is where the cost savings come in. PilotDeck can route different types of tasks to different models automatically. So simple classification tasks go to a lightweight model (think Mistral 7B or GPT-4o-mini). Complex code generation goes to a frontier model. The system handles the routing — you don’t think about it.

And the savings are real. Here’s data from my own testing:

Task Type	Without Smart Routing	With Smart Routing	Savings
Social media monitoring (daily)	67,810 tokens	20,464 tokens	~70%
Code review (5 files)	41,526 tokens	18,736 tokens	~55%
Research summary (10 articles)	85,200 tokens	34,120 tokens	~60%
Daily issue triage (50 items)	52,340 tokens	14,782 tokens	~72%

But a quick note: these savings are scenario-specific. Social media monitoring benefits hugely because most of the work is simple categorization — no need for GPT-4 there. Complex code review still uses frontier models for the hard parts. So don’t expect 70% savings across the board. But on mixed workloads? The savings add up fast.

If you’re looking for more ways to cut AI agent costs, I covered a different approach in my Headroom review — it compresses context at the proxy level instead of routing to cheaper models. Between the two, you can get pretty aggressive with cost optimization.

Always-On Background Execution

Here’s the feature that makes PilotDeck fundamentally different from Claude Code or Cursor: it doesn’t need you present to work.

Here’s the thing: you assign a task to a WorkSpace and close the browser. And PilotDeck keeps running on your server, checking progress, completing tasks, and storing results. Come back an hour, a day, or a week later — the work is done.

So I tested this by setting up a WorkSpace to crawl and summarize a competitor’s documentation site. Started it before lunch, came back after a meeting, and the summary was ready with citations. No one sitting at the keyboard. Just the agent, working.

But this only works if you have a server that stays on 24/7.

Deploying PilotDeck on a VPS

You have two options to get PilotDeck running. The one-liner install works great for local testing:

curl -fsSL https://raw.githubusercontent.com/OpenBMB/PilotDeck/main/install.sh | bash

But for the always-on mode — where PilotDeck really shines — you want Docker Compose on a VPS:

# Clone the repo
git clone https://github.com/OpenBMB/PilotDeck.git
cd PilotDeck

# Set your API key in docker-compose.yml, then:
docker compose up -d

And that’s it. Three commands. The Web UI is at http://your-server-ip:3001 and you’re ready to create WorkSpaces.

Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.

Vultr — starts at $6/mo, up to $100 credit for new users
DigitalOcean — $200 credit for new users, enough for ~2 years of PilotDeck hosting

So this is where the VPS recommendation comes in. PilotDeck’s always-on mode needs a server that’s running continuously. A $6/month DigitalOcean Droplet handles it easily — I’m running it on one right now with two active WorkSpaces, and resource usage stays under 30%. That $200 new-user credit DigitalOcean offers would cover more than a year of hosting. Vultr’s $50-100 credit is another solid option if DO isn’t available in your region.

How PilotDeck Stacks Up Against the Competition

Feature	PilotDeck	Claude Code	Cursor	WorkBuddy
Multi-project isolation	✅ Full WorkSpace	❌ Single session	❌ Single session	✅ Project-based
White-box memory	✅ Editable & traceable	❌ Black box	❌ Black box	❌
Smart model routing	✅ Up to 70% savings	❌	❌	❌
Always-on background	✅	❌	❌	❌
Docker self-hosted	✅ `docker compose up`	❌	❌	❌
Open source license	✅ AGPL-3.0	❌	❌	❌
Cost	Free (self-hosted)	$20/mo	$20/mo	Not public

If you’re coming from my DeerFlow review — here’s how they differ. DeerFlow is an agent harness: it orchestrates sub-agents within a single complex task. PilotDeck is an agent operating system: it manages multiple projects side by side, each with its own memory and routing. They’re complementary tools, not replacements.

Who Should Use PilotDeck

You, if:

You manage 3+ AI-assisted projects simultaneously
You’re tired of context pollution across sessions
You want visibility into what your agent actually remembers
You need a long-running, always-on agent on your own infrastructure
You’re evaluating self-hosted AI agent platforms

Maybe not you, if:

You only need one-off coding assistance
You don’t want to manage a server
AGPL-3.0 licensing is a problem for your use case (it has strong copyleft terms)
You need a polished, consumer-grade UI (PilotDeck’s interface is functional but not pretty)

Still, the community is growing fast — 78 open issues and active daily commits suggest this project has momentum. The docs are decent but not comprehensive yet. That’s typical for a 3-week-old project.

The Bottom Line

Still, PilotDeck fills a real gap. Now, Claude Code and Cursor are great for what they do — fast, single-session coding. But if you’re running multiple AI projects in parallel, managing context manually is a time sink. PilotDeck’s WorkSpace isolation, white-box memory, and smart routing solve that problem elegantly.

And I’m keeping it running on my VPS. The always-on mode alone — being able to assign a task, walk away, and come back to results — has changed how I approach research-heavy work. And with the $200 DigitalOcean credit making the hosting effectively free for a year, there’s not much downside to giving it a try.

If you’re tired of your AI agent forgetting what you talked about five minutes ago — this one’s worth the 10-minute setup.

Some links below are affiliate links. I may earn a commission if you purchase through them, at no extra cost to you.

Building LLM Powered Applications — hands-on guide to building intelligent apps and agents with LLMs

OpenBMB on ToolGenix — AI Tools Discovery & Reviews