Your AI coding agent starts from zero every time you open a new session. It doesn’t remember the bug fix it nailed yesterday. It can’t tell another agent “hey, I already solved this pattern.” Every conversation is a fresh amnesiac.
But what if agents could learn from each other? What if one agent’s hard-won debugging experience became a lesson another agent could replay?
So that’s the bet behind Agent Apprenticeship — a 520-star project that hit GitHub 48 hours ago and is already at npm v0.1.5 with 244 weekly downloads. Not another AI agent. An infrastructure layer between agents that lets them exchange execution experience.
I installed it on Hermes Agent and ran it through a real task. Here’s what the whole thing looks like today.
The Short Version: Cross-Session Agent Learning
Agent Apprenticeship is the first open infrastructure for cross-session agent experience transfer. It wraps an agent’s task execution into structured Contribution Bundles — think of them as git commits for agent knowledge — that can be shared, reviewed, and replayed by other agents.
But this is v0.1.5 with zero forks. So it’s an early experiment that defines a new category rather than delivering a polished product. The ambition is right. And the execution is getting there.
What Is Agent Apprenticeship?
Now think of open source. Before GitHub, code lived in silos. Then Git made sharing trivial, and the whole industry accelerated. Still, Agent Apprenticeship wants to do the same for agent execution experience.
Right now, every AI agent session is a closed loop. So you give it a task, it does the work, you take the output — and the knowledge evaporates. Even the same agent on the same machine can’t learn from its own past runs unless you manually feed it context.
Still, Agent Apprenticeship creates a standard format — Contribution Bundles — that capture not just what the agent produced, but how it got there: the traces, the checkpoints, the decisions, the failures. These bundles can be contributed to a shared ecosystem, and other agents can import them as Experience Packs to bootstrap their own task execution.
The Core Agent Mechanism
So the system has four moving parts:
| Concept | Role | What It Does |
|---|---|---|
| Apprentice Agent | Worker | The AI agent executing the task (Claude Code, Codex, Hermes, Cursor, OpenCode) |
| Mentor Mode | Oversight | Three levels: model-assisted (AI reviews), expert-led (human checks), hybrid |
| Experience Packs | Knowledge | Reusable lessons distilled from past Contribution Bundles |
| Contribution Bundles | Artifacts | The complete package: traces, checkpoints, manifests, learning data |
But the interesting part is the Mentor Mode. A model-assisted run uses an LLM to review the Apprentice’s work at each stage. Expert-led pauses at key checkpoints for human approval — I saw this first-hand when my test hit a “task intake” checkpoint and asked me to sign off before the agent started. Even the approval flow has structure: you get a rubric to review, not just a vague thumbs-up prompt.
Hands-On: Installing and Running Agent Apprenticeship
I tested this on my Windows dev machine with Hermes Agent already installed. Here’s exactly what happened.
Step 1 — npx agent-apprenticeship init --defaults
So the install pulled v0.1.5 from npm and auto-detected Hermes Agent. No config files to write, no manual paths. It even found the hermes command without me pointing at it.
Detected Apprentice Agents:
1. Hermes Agent - command found (hermes)
2. Custom - use a custom command template
Configured Apprentice Agent: Hermes Agent
Took about 30 seconds from command to ready. But it did flag that no Mentor Model Provider API key was detected — you need an OpenAI, Anthropic, or OpenRouter key for the model-assisted mode. That’s a real friction point for casual testing.
Step 2 — Running a task
I fed it: “Write a Python script that lists all files in a directory recursively.” In expert-led mode (no API key needed), the system created a full workspace with task intake, rubric, and checkpoint approval stages. Here’s the flow I walked through:
- Task intake — I approved the task description before execution started
- Rubric — I approved the success criteria the system generated
- Apprentice attempt — Hermes ran the task, but it failed on output contract: the tool expects specific
agent_trace.jsonandactual_outputs.jsonfiles that Hermes didn’t produce - Contribution Bundle — Even on a failed run, the system generated a complete bundle with session metadata, checkpoints, traces, and a contribution manifest
Step 3 — What’s in a Contribution Bundle
Let me show you the directory structure of what came out:
contribution_bundle/
├── contribution_card.md
├── contribution_manifest.json
├── session_metadata.json
├── session_events.jsonl
├── mentor_checkpoints/
│ ├── task_intake_checkpoint.json
│ ├── rubric_checkpoint.json
│ └── final_approval_checkpoint.json
├── attempts/
├── traces/
├── learning_data/
└── evaluation/
But that’s a lot of structure for a task that failed. The session_events.jsonl logs every step. The contribution_manifest.json records the attempt count, mentor mode, traced steps, and failure reason. So even a failed run produces training signal — which is exactly the point.
And here’s the key insight: that bundle format doesn’t change whether the task succeeded or failed. A failure becomes a data point about what not to do. That’s the same logic behind reinforcement learning from human feedback — negative examples are just as valuable as positive ones.
The Seed Dataset: 500+ Training Tasks for Agents
Still, the project ships with 500+ curated seed tasks, 495 reusable agent lessons, and over 1000 agent execution traces. That seed dataset is the core asset — it gives the ecosystem a starting library of experience that any new agent can pull from using apprentice learn. Think of it as the first training curriculum for agents that haven’t run any real tasks yet.
| Asset | Count | Purpose |
|---|---|---|
| Curated seed tasks | 500+ | Training set for Apprentice Agents |
| Reusable agent lessons | 495 | Bite-sized experience packs |
| Agent execution traces | 1000+ | Full run logs for replay |
Still, whether these are genuinely useful or just volume fillers is something I can’t fully judge without spending more time with the dataset. But 500 tasks across potentially diverse domains makes this more than a toy project.
How Agent Apprenticeship Compares to the Ecosystem
I’ve been tracking the agent tool ecosystem closely for ToolGenix. Here’s where Agent Apprenticeship fits:
| Dimension | Agent Apprenticeship | agent-skills | SkillSpector | Omnigent |
|---|---|---|---|---|
| Core focus | Cross-session learning | In-session skill commands | Security scanning | Agent orchestration |
| Key output | Contribution Bundles | Slash commands | Vulnerability scores | Session management |
| VPS deploy | ❌ CLI only | ❌ CLI only | ❌ CLI only | ✅ Docker |
| Hermes support | ✅ Listed | ✅ Compatible | ✅ Claude/Codex | N/A |
| Maturity | v0.1.5, 0 forks | 52.5k★, mature | 26k★, established | N/A |
The closest cousin is agent-skills, which also addresses agent quality — but agent-skills focuses on within a single session (spec-first, test-drive, review-before-merge), while Agent Apprenticeship focuses on across sessions and across agents. They’re complementary rather than competitive.
SkillSpector sits in a different lane entirely — security scanning for agent skills. Different problem, different tool.
Who Should Try This
You’re the target audience if:
- You use Claude Code, Codex, Cursor, or Hermes Agent daily for complex multi-step tasks
- You’ve wished your agent could remember how it solved something last week
- You’re interested in agent-to-agent knowledge transfer, even in early form
- You’re comfortable with command-line tools and reading JSON manifests
You should probably wait if:
- You expect a polished, consumer-ready product
- You don’t want to configure an LLM API key for model-assisted mode
- Your agent work is simple one-shot prompts that don’t benefit from structured workflows
The Bottom Line
Agent Apprenticeship is defining a new category. And the idea — agents that learn from execution experience and share that knowledge — is the right direction. The seed dataset gives it a real head start. And the structural rigor (checkpoints, manifests, traces) is impressive for a v0.1.x project.
But. Zero forks. Hermes integration failed on output contract in my test. Expert-led mode requires manual checkpoints that feel like overhead for simple tasks. Model-assisted mode needs an API key. Still, this is an alpha-grade infrastructure project, not a tool you install and forget.
Still? I’d rather be early to this conversation than late. If agent learning ecosystems become a thing — and I think they will — Agent Apprenticeship is the first serious attempt I’ve seen. So it’s worth watching, worth testing, worth contributing to if the model fits your stack.
If you want to go deeper on how agents handle complex workflows, these are two solid reads: