Your AI coding agent starts from zero every time you open a new session. It doesn’t remember the bug fix it nailed yesterday. It can’t tell another agent “hey, I already solved this pattern.” Every conversation is a fresh amnesiac.

But what if agents could learn from each other? What if one agent’s hard-won debugging experience became a lesson another agent could replay?

So that’s the bet behind Agent Apprenticeship — a 520-star project that hit GitHub 48 hours ago and is already at npm v0.1.5 with 244 weekly downloads. Not another AI agent. An infrastructure layer between agents that lets them exchange execution experience.

I installed it on Hermes Agent and ran it through a real task. Here’s what the whole thing looks like today.

The Short Version: Cross-Session Agent Learning

Agent Apprenticeship is the first open infrastructure for cross-session agent experience transfer. It wraps an agent’s task execution into structured Contribution Bundles — think of them as git commits for agent knowledge — that can be shared, reviewed, and replayed by other agents.

But this is v0.1.5 with zero forks. So it’s an early experiment that defines a new category rather than delivering a polished product. The ambition is right. And the execution is getting there.

What Is Agent Apprenticeship?

Now think of open source. Before GitHub, code lived in silos. Then Git made sharing trivial, and the whole industry accelerated. Still, Agent Apprenticeship wants to do the same for agent execution experience.

Right now, every AI agent session is a closed loop. So you give it a task, it does the work, you take the output — and the knowledge evaporates. Even the same agent on the same machine can’t learn from its own past runs unless you manually feed it context.

Still, Agent Apprenticeship creates a standard format — Contribution Bundles — that capture not just what the agent produced, but how it got there: the traces, the checkpoints, the decisions, the failures. These bundles can be contributed to a shared ecosystem, and other agents can import them as Experience Packs to bootstrap their own task execution.

The Core Agent Mechanism

So the system has four moving parts:

Concept Role What It Does
Apprentice Agent Worker The AI agent executing the task (Claude Code, Codex, Hermes, Cursor, OpenCode)
Mentor Mode Oversight Three levels: model-assisted (AI reviews), expert-led (human checks), hybrid
Experience Packs Knowledge Reusable lessons distilled from past Contribution Bundles
Contribution Bundles Artifacts The complete package: traces, checkpoints, manifests, learning data

But the interesting part is the Mentor Mode. A model-assisted run uses an LLM to review the Apprentice’s work at each stage. Expert-led pauses at key checkpoints for human approval — I saw this first-hand when my test hit a “task intake” checkpoint and asked me to sign off before the agent started. Even the approval flow has structure: you get a rubric to review, not just a vague thumbs-up prompt.

Hands-On: Installing and Running Agent Apprenticeship

I tested this on my Windows dev machine with Hermes Agent already installed. Here’s exactly what happened.

Step 1 — npx agent-apprenticeship init --defaults

So the install pulled v0.1.5 from npm and auto-detected Hermes Agent. No config files to write, no manual paths. It even found the hermes command without me pointing at it.

Detected Apprentice Agents:
1. Hermes Agent - command found (hermes)
2. Custom - use a custom command template
Configured Apprentice Agent: Hermes Agent

Took about 30 seconds from command to ready. But it did flag that no Mentor Model Provider API key was detected — you need an OpenAI, Anthropic, or OpenRouter key for the model-assisted mode. That’s a real friction point for casual testing.

Step 2 — Running a task

I fed it: “Write a Python script that lists all files in a directory recursively.” In expert-led mode (no API key needed), the system created a full workspace with task intake, rubric, and checkpoint approval stages. Here’s the flow I walked through:

  1. Task intake — I approved the task description before execution started
  2. Rubric — I approved the success criteria the system generated
  3. Apprentice attempt — Hermes ran the task, but it failed on output contract: the tool expects specific agent_trace.json and actual_outputs.json files that Hermes didn’t produce
  4. Contribution Bundle — Even on a failed run, the system generated a complete bundle with session metadata, checkpoints, traces, and a contribution manifest

Step 3 — What’s in a Contribution Bundle

Let me show you the directory structure of what came out:

contribution_bundle/
├── contribution_card.md
├── contribution_manifest.json
├── session_metadata.json
├── session_events.jsonl
├── mentor_checkpoints/
│   ├── task_intake_checkpoint.json
│   ├── rubric_checkpoint.json
│   └── final_approval_checkpoint.json
├── attempts/
├── traces/
├── learning_data/
└── evaluation/

But that’s a lot of structure for a task that failed. The session_events.jsonl logs every step. The contribution_manifest.json records the attempt count, mentor mode, traced steps, and failure reason. So even a failed run produces training signal — which is exactly the point.

And here’s the key insight: that bundle format doesn’t change whether the task succeeded or failed. A failure becomes a data point about what not to do. That’s the same logic behind reinforcement learning from human feedback — negative examples are just as valuable as positive ones.

The Seed Dataset: 500+ Training Tasks for Agents

Still, the project ships with 500+ curated seed tasks, 495 reusable agent lessons, and over 1000 agent execution traces. That seed dataset is the core asset — it gives the ecosystem a starting library of experience that any new agent can pull from using apprentice learn. Think of it as the first training curriculum for agents that haven’t run any real tasks yet.

Asset Count Purpose
Curated seed tasks 500+ Training set for Apprentice Agents
Reusable agent lessons 495 Bite-sized experience packs
Agent execution traces 1000+ Full run logs for replay

Still, whether these are genuinely useful or just volume fillers is something I can’t fully judge without spending more time with the dataset. But 500 tasks across potentially diverse domains makes this more than a toy project.

How Agent Apprenticeship Compares to the Ecosystem

I’ve been tracking the agent tool ecosystem closely for ToolGenix. Here’s where Agent Apprenticeship fits:

Dimension Agent Apprenticeship agent-skills SkillSpector Omnigent
Core focus Cross-session learning In-session skill commands Security scanning Agent orchestration
Key output Contribution Bundles Slash commands Vulnerability scores Session management
VPS deploy ❌ CLI only ❌ CLI only ❌ CLI only ✅ Docker
Hermes support ✅ Listed ✅ Compatible ✅ Claude/Codex N/A
Maturity v0.1.5, 0 forks 52.5k★, mature 26k★, established N/A

The closest cousin is agent-skills, which also addresses agent quality — but agent-skills focuses on within a single session (spec-first, test-drive, review-before-merge), while Agent Apprenticeship focuses on across sessions and across agents. They’re complementary rather than competitive.

SkillSpector sits in a different lane entirely — security scanning for agent skills. Different problem, different tool.

Who Should Try This

You’re the target audience if:

  • You use Claude Code, Codex, Cursor, or Hermes Agent daily for complex multi-step tasks
  • You’ve wished your agent could remember how it solved something last week
  • You’re interested in agent-to-agent knowledge transfer, even in early form
  • You’re comfortable with command-line tools and reading JSON manifests

You should probably wait if:

  • You expect a polished, consumer-ready product
  • You don’t want to configure an LLM API key for model-assisted mode
  • Your agent work is simple one-shot prompts that don’t benefit from structured workflows

The Bottom Line

Agent Apprenticeship is defining a new category. And the idea — agents that learn from execution experience and share that knowledge — is the right direction. The seed dataset gives it a real head start. And the structural rigor (checkpoints, manifests, traces) is impressive for a v0.1.x project.

But. Zero forks. Hermes integration failed on output contract in my test. Expert-led mode requires manual checkpoints that feel like overhead for simple tasks. Model-assisted mode needs an API key. Still, this is an alpha-grade infrastructure project, not a tool you install and forget.

Still? I’d rather be early to this conversation than late. If agent learning ecosystems become a thing — and I think they will — Agent Apprenticeship is the first serious attempt I’ve seen. So it’s worth watching, worth testing, worth contributing to if the model fits your stack.

If you want to go deeper on how agents handle complex workflows, these are two solid reads: