<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Self-Hosting on ToolGenix — AI Tools Discovery &amp; Reviews</title>
    <link>https://toolgenix.nxtniche.com/tags/self-hosting/</link>
    <description>Recent content in Self-Hosting on ToolGenix — AI Tools Discovery &amp; Reviews</description>
    <generator>Hugo</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 08 Jun 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://toolgenix.nxtniche.com/tags/self-hosting/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>How to Deploy Hermes Agent on Your Own VPS: Step-by-Step Guide (2026)</title>
      <link>https://toolgenix.nxtniche.com/posts/hermes-vps-deployment-guide/</link>
      <pubDate>Mon, 08 Jun 2026 00:00:00 +0000</pubDate>
      <guid>https://toolgenix.nxtniche.com/posts/hermes-vps-deployment-guide/</guid>
      <description>Step-by-step guide to deploying Hermes Agent on a $6/mo VPS — open-source AI agent with 185k&#43; GitHub stars, persistent memory, Kanban scheduling, and full data control.</description>
      <content:encoded><![CDATA[<h1 id="how-to-deploy-hermes-agent-on-your-own-vps-step-by-step-guide-2026">How to Deploy Hermes Agent on Your Own VPS: Step-by-Step Guide (2026)</h1>
<p><strong>TL;DR:</strong> Deploy Hermes Agent on a $6/mo VPS — open-source AI agent with 185k+ GitHub stars, persistent memory, and Kanban task scheduling. Own your automation stack with no lock-in and no data leaving your server.</p>
<h2 id="why-self-host-hermes-agent">Why Self-Host Hermes Agent?</h2>
<p>Here&rsquo;s the problem with SaaS AI agents: you pay per seat, your data lives on someone else&rsquo;s server, and you&rsquo;re locked into whatever features they decide to ship. Self-hosting Hermes Agent flips that — one VPS, unlimited users in your team, full control over which models you use, and your conversation history stays on hardware you control.</p>
<p>I&rsquo;ve been running Hermes Agent on a $6/mo DigitalOcean Droplet for the past three months, and it handles everything from daily news summarization (via cron jobs) to GitHub PR reviews (via the Kanban pipeline). The agent never sleeps, never asks for a credit card top-up, and the active community pushes updates almost daily.</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Feature</th>
					<th style="text-align: center">Hermes Agent (Self-Hosted)</th>
					<th style="text-align: center">SaaS AI Agent (e.g. ChatGPT Teams)</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">Monthly cost</td>
					<td style="text-align: center">$6–12 VPS</td>
					<td style="text-align: center">$25–$60 per seat</td>
			</tr>
			<tr>
					<td style="text-align: left">Data residency</td>
					<td style="text-align: center">Your VPS</td>
					<td style="text-align: center">Provider&rsquo;s cloud</td>
			</tr>
			<tr>
					<td style="text-align: left">Model choice</td>
					<td style="text-align: center">Any API (DeepSeek/OpenAI/Anthropic)</td>
					<td style="text-align: center">Provider&rsquo;s model only</td>
			</tr>
			<tr>
					<td style="text-align: left">Users per account</td>
					<td style="text-align: center">Unlimited (SSH/WebUI)</td>
					<td style="text-align: center">Per-seat billing</td>
			</tr>
			<tr>
					<td style="text-align: left">Skills/plugins</td>
					<td style="text-align: center">Open marketplace</td>
					<td style="text-align: center">Closed ecosystem</td>
			</tr>
			<tr>
					<td style="text-align: left">Persistent memory</td>
					<td style="text-align: center">Hindsight (self-hosted)</td>
					<td style="text-align: center">Provider-managed</td>
			</tr>
	</tbody>
</table>
<p>So if you&rsquo;re a solo developer, a small team, or anyone who values data privacy and predictable costs, self-hosting is the way to go.</p>
<h2 id="what-youll-need-to-deploy-hermes-agent">What You&rsquo;ll Need to Deploy Hermes Agent</h2>
<p>Before we start, make sure you have:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Requirement</th>
					<th style="text-align: center">Recommended Spec</th>
					<th style="text-align: left">Notes</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">VPS</td>
					<td style="text-align: center">1 vCPU, 2GB RAM, 25GB SSD</td>
					<td style="text-align: left">$6/mo DigitalOcean Droplet or $6/mo Vultr instance</td>
			</tr>
			<tr>
					<td style="text-align: left">OS</td>
					<td style="text-align: center">Ubuntu 22.04 LTS or Debian 12</td>
					<td style="text-align: left">Both have good Python package support</td>
			</tr>
			<tr>
					<td style="text-align: left">Python</td>
					<td style="text-align: center">3.11+</td>
					<td style="text-align: left">Hermes requires Python 3.10–3.12</td>
			</tr>
			<tr>
					<td style="text-align: left">Domain (optional)</td>
					<td style="text-align: center">Any DNS-managed domain</td>
					<td style="text-align: left">Needed for HTTPS + WebUI access with Cloudflare Tunnel</td>
			</tr>
			<tr>
					<td style="text-align: left">API Key</td>
					<td style="text-align: center">DeepSeek/OpenAI/Anthropic</td>
					<td style="text-align: left">At least one provider key for the agent to function</td>
			</tr>
	</tbody>
</table>
<p><strong>My recommendation:</strong> Start with a <a href="https://toolgenix.nxtniche.com/go/vultr">Vultr $6/mo instance</a> (2GB RAM, 1 vCPU). If you hit memory limits during heavy skill usage, scale to the $12/mo plan. I started on a $6 plan and only upgraded after I added six concurrent cron jobs.</p>
<hr>
<h2 id="step-1-provision-your-vps">Step 1: Provision Your VPS</h2>
<!-- BEGIN AFFILIATE LINKS (generated by ads-center for Hermes VPS Deployment Guide) -->
<div class="affiliate-block">
<p><strong>👉 Get your VPS here (both offer free credits for new users):</strong></p>
<ul>
  <li><a href="https://toolgenix.nxtniche.com/go/do" rel="nofollow sponsored" target="_blank">DigitalOcean</a> — <strong>$200 credit</strong> for 60 days on new accounts. The $6/mo Droplet (2GB RAM, 1 vCPU, 25GB SSD) handles Hermes Agent with room to spare.</li>
  <li><a href="https://toolgenix.nxtniche.com/go/vultr" rel="nofollow sponsored" target="_blank">Vultr</a> — <strong>$50–$100 credit</strong> for new users. Same price tier, great alternative if you prefer the Vultr control panel or want more global data center options.</li>
</ul>
<p><em>Disclosure: If you sign up through these links, I may earn a commission at no extra cost to you. I personally use both providers in production and recommend them based on real experience.</em></p>
</div>
<!-- END AFFILIATE LINKS -->
<p>Sure, this is the only step that costs money. But it&rsquo;s the most important one — pick a reliable provider so you&rsquo;re not rebuilding your agent when the VPS goes down.</p>
<h3 id="option-a-vultr-recommended">Option A: Vultr (Recommended)</h3>
<p><a href="https://toolgenix.nxtniche.com/go/vultr" rel="nofollow sponsored" target="_blank">Vultr</a> is my top pick for Hermes deployment. Here&rsquo;s why:</p>
<ol>
<li>Sign up at <strong>Vultr</strong> — new users get <strong>$50–$100 credit</strong> on their first deposit</li>
<li>Deploy a cloud instance with:
<ul>
<li><strong>Ubuntu 22.04 LTS</strong></li>
<li><strong>$6/mo plan</strong> (2GB RAM, 1 vCPU, 25GB SSD)</li>
<li>Add your SSH key for passwordless login</li>
</ul>
</li>
<li>Note the instance IP address</li>
<li>SSH in: <code>ssh root@&lt;your-instance-ip&gt;</code></li>
</ol>
<p>Vultr has 32 data center locations worldwide — so you can pick one closest to you for the lowest latency. Their NVMe SSD storage is fast enough for Hermes&rsquo;s Hindsight memory database.</p>
<h3 id="option-b-digitalocean-alternative">Option B: DigitalOcean (Alternative)</h3>
<p><a href="https://toolgenix.nxtniche.com/go/do">DigitalOcean</a> also offers a $6/mo Droplet and is a solid choice, especially in North America. The deployment steps are identical once you have SSH access.</p>
<blockquote>
<p><strong>Pro tip from my experience:</strong> Enable automatic backups ($1/mo extra) on your VPS. When I accidentally broke my Hermes config while experimenting with a custom skill, having a backup saved me a full reinstall. Worth every penny.</p>
</blockquote>
<hr>
<h2 id="step-2-install-python-311--uv">Step 2: Install Python 3.11 + uv</h2>
<p>Modern Hermes Agent uses <code>uv</code> — a fast Python package manager written in Rust. So don&rsquo;t use the system Python; install a clean 3.11 via the deadsnakes PPA.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Update system packages</span>
</span></span><span style="display:flex;"><span>apt update <span style="color:#f92672">&amp;&amp;</span> apt upgrade -y
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Install Python 3.11</span>
</span></span><span style="display:flex;"><span>apt install -y software-properties-common
</span></span><span style="display:flex;"><span>add-apt-repository -y ppa:deadsnakes/ppa
</span></span><span style="display:flex;"><span>apt install -y python3.11 python3.11-venv python3.11-dev
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Set Python 3.11 as default</span>
</span></span><span style="display:flex;"><span>update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 <span style="color:#ae81ff">1</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Install uv</span>
</span></span><span style="display:flex;"><span>curl -LsSf https://astral.sh/uv/install.sh | sh
</span></span><span style="display:flex;"><span>source ~/.bashrc
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Verify</span>
</span></span><span style="display:flex;"><span>python3 --version   <span style="color:#75715e"># Should show Python 3.11.x</span>
</span></span><span style="display:flex;"><span>uv --version        <span style="color:#75715e"># Should show uv 0.4.x or newer</span>
</span></span></code></pre></div><p>Look, I made this mistake myself. In my first deployment I used the system Python 3.10 from Ubuntu&rsquo;s default repo. Everything worked until I tried to install a skill that required 3.11+. So save yourself the headache — go with 3.11 from the start.</p>
<hr>
<h2 id="step-3-clone-and-install-hermes-agent">Step 3: Clone and Install Hermes Agent</h2>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>cd /opt
</span></span><span style="display:flex;"><span>git clone https://github.com/NousResearch/hermes-agent
</span></span><span style="display:flex;"><span>cd hermes-agent
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Create virtual environment and install</span>
</span></span><span style="display:flex;"><span>uv venv
</span></span><span style="display:flex;"><span>source .venv/bin/activate
</span></span><span style="display:flex;"><span>uv pip install -e .
</span></span></code></pre></div><p>Plus, the <code>-e</code> flag installs in editable mode, so pulling future updates is just <code>git pull &amp;&amp; uv pip install -e .</code> — no rebuild needed.</p>
<hr>
<h2 id="step-4-configure-hermes-agent-api-providers">Step 4: Configure Hermes Agent API Providers</h2>
<p>Hermes needs at least one LLM provider to function. Run the setup wizard:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>hermes setup
</span></span></code></pre></div><p>This prompts you for:</p>
<ul>
<li><strong>Primary provider</strong> — I use DeepSeek (cheapest, ~$0.14/M input tokens) for most tasks and fall back to Claude for complex reasoning</li>
<li><strong>API key</strong> — Paste your key (it&rsquo;s stored locally in <code>~/.hermes/config.yaml</code>)</li>
<li><strong>Default model</strong> — The model used for general tasks</li>
</ul>
<p>Or if you prefer manual configuration, edit <code>~/.hermes/config.yaml</code> directly:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">providers</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">deepseek</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">api_key</span>: <span style="color:#e6db74">&#34;***&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">models</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default</span>: <span style="color:#e6db74">&#34;deepseek-chat&#34;</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">openai</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">api_key</span>: <span style="color:#e6db74">&#34;***&#34;</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">models</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">default</span>: <span style="color:#e6db74">&#34;gpt-4o&#34;</span>
</span></span></code></pre></div><table>
	<thead>
			<tr>
					<th style="text-align: left">Provider</th>
					<th style="text-align: center">Cost per 1M input tokens</th>
					<th style="text-align: left">Best For</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">DeepSeek</td>
					<td style="text-align: center">$0.14</td>
					<td style="text-align: left">Daily automation, low-cost tasks</td>
			</tr>
			<tr>
					<td style="text-align: left">Anthropic Claude</td>
					<td style="text-align: center">$3.00</td>
					<td style="text-align: left">Complex reasoning, code review</td>
			</tr>
			<tr>
					<td style="text-align: left">OpenAI GPT-4o</td>
					<td style="text-align: center">$2.50</td>
					<td style="text-align: left">General purpose, stable</td>
			</tr>
			<tr>
					<td style="text-align: left">OpenRouter</td>
					<td style="text-align: center">Varies</td>
					<td style="text-align: left">Access to 200+ models from one key</td>
			</tr>
	</tbody>
</table>
<p><strong>Compliance note:</strong> Your API key never leaves your VPS — all requests go directly from your Hermes instance to the provider&rsquo;s API. No middleman, no data logging by a third-party agent platform.</p>
<hr>
<h2 id="step-5-set-up-hermes-hindsight-memory">Step 5: Set Up Hermes Hindsight Memory</h2>
<p>Still, Hindsight is Hermes&rsquo;s persistent memory system. Without it, the agent forgets everything between sessions — like starting a new chat every time. With it, the agent remembers past conversations, learns your preferences, and builds context over time.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Initialize the Hindsight memory store</span>
</span></span><span style="display:flex;"><span>hermes setup --memory
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Verify it&#39;s running</span>
</span></span><span style="display:flex;"><span>curl http://localhost:8000/health
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Should return: {&#34;status&#34;: &#34;ok&#34;}</span>
</span></span></code></pre></div><p>Hindsight uses a local vector store (SQLite + embeddings) so there&rsquo;s no dependency on external databases. And for my setup with 3 months of daily usage, the database is under 200MB — negligible on a 25GB disk.
By comparison, <a href="/posts/supermemory-quick-review-2026/">Supermemory&rsquo;s approach</a> uses a different persistence strategy that&rsquo;s worth checking out if you&rsquo;re evaluating memory systems.</p>
<hr>
<h2 id="step-6-install-skills-and-go-live">Step 6: Install Skills and Go Live</h2>
<p>Skills are what make Hermes useful beyond basic chat. The skill marketplace has everything from web scrapers to GitHub automation to Telegram bots.</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># List available skills</span>
</span></span><span style="display:flex;"><span>hermes skill list
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Install a few to start</span>
</span></span><span style="display:flex;"><span>hermes skill install web-search
</span></span><span style="display:flex;"><span>hermes skill install github-pr-review
</span></span><span style="display:flex;"><span>hermes skill install cron-scheduler
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Start the agent (interactive mode)</span>
</span></span><span style="display:flex;"><span>hermes run
</span></span></code></pre></div><p>To run Hermes as a persistent service (recommended for a VPS deployment):</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Create a systemd service</span>
</span></span><span style="display:flex;"><span>cat &gt; /etc/systemd/system/hermes.service <span style="color:#e6db74">&lt;&lt; &#39;EOF&#39;
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">[Unit]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Description=Hermes Agent
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">After=network.target
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">[Service]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Type=simple
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">User=root
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">WorkingDirectory=/opt/hermes-agent
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">ExecStart=/opt/hermes-agent/.venv/bin/hermes run --daemon
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">Restart=always
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">RestartSec=10
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">[Install]
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">WantedBy=multi-user.target
</span></span></span><span style="display:flex;"><span><span style="color:#e6db74">EOF</span>
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span>systemctl daemon-reload
</span></span><span style="display:flex;"><span>systemctl enable hermes
</span></span><span style="display:flex;"><span>systemctl start hermes
</span></span><span style="display:flex;"><span>systemctl status hermes
</span></span></code></pre></div><p>If you want the WebUI:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>hermes webui
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Access at http://&lt;your-vps-ip&gt;:8080</span>
</span></span></code></pre></div><hr>
<h2 id="optional-cloudflare-tunnel-for-https-web-access">(Optional) Cloudflare Tunnel for HTTPS Web Access</h2>
<p>Don&rsquo;t have a domain? Cloudflare Tunnel gives you a <code>*.trycloudflare.com</code> subdomain with automatic HTTPS:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span><span style="color:#75715e"># Install cloudflared</span>
</span></span><span style="display:flex;"><span>curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
</span></span><span style="display:flex;"><span>chmod +x /usr/local/bin/cloudflared
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#75715e"># Run tunnel to Hermes WebUI</span>
</span></span><span style="display:flex;"><span>cloudflared tunnel --url http://localhost:8080
</span></span></code></pre></div><p>You&rsquo;ll get a URL like <code>https://hermes-foobar.trycloudflare.com</code> — access your WebUI from anywhere with HTTPS. That said, the tunnel is temporary by default; you can upgrade to a named tunnel with your own domain later.</p>
<hr>
<h2 id="hermes-agent-pricing-breakdown">Hermes Agent Pricing Breakdown</h2>
<p>Let&rsquo;s be honest about costs. Here&rsquo;s what you&rsquo;re actually paying:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Component</th>
					<th style="text-align: center">Monthly Cost</th>
					<th style="text-align: left">Notes</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">VPS (Vultr $6 plan)</td>
					<td style="text-align: center">$6.00</td>
					<td style="text-align: left">2GB RAM, 1 vCPU, 25GB SSD</td>
			</tr>
			<tr>
					<td style="text-align: left">API usage (DeepSeek, light)</td>
					<td style="text-align: center">$2–5</td>
					<td style="text-align: left">~500k tokens/day for personal use</td>
			</tr>
			<tr>
					<td style="text-align: left">API usage (DeepSeek, heavy)</td>
					<td style="text-align: center">$10–20</td>
					<td style="text-align: left">Cron jobs + PR reviews + daily summaries</td>
			</tr>
			<tr>
					<td style="text-align: left">Domain (optional)</td>
					<td style="text-align: center">$1/mo amortized</td>
					<td style="text-align: left">~$12/year for a .com</td>
			</tr>
			<tr>
					<td style="text-align: left"><strong>Total (light usage)</strong></td>
					<td style="text-align: center"><strong>$8–11/mo</strong></td>
					<td style="text-align: left">One-time setup cost</td>
			</tr>
			<tr>
					<td style="text-align: left"><strong>Total (heavy usage)</strong></td>
					<td style="text-align: center"><strong>$16–26/mo</strong></td>
					<td style="text-align: left">Still cheaper than one SaaS seat</td>
			</tr>
	</tbody>
</table>
<p>So compare that to ChatGPT Teams at $25/seat/month or Claude Enterprise at $30/seat/month — and you&rsquo;re getting more features, full data control, and unlimited users.</p>
<hr>
<h2 id="common-mistakes-i-made-so-you-dont-have-to">Common Mistakes I Made (So You Don&rsquo;t Have To)</h2>
<ol>
<li><strong>Using the system Python</strong> — Ubuntu ships Python 3.10, but some skills need 3.11+. Install via deadsnakes PPA.</li>
<li><strong>Forgetting to enable swap</strong> — 2GB RAM is fine, but if you run multiple skills simultaneously, add 2GB swap: <code>fallocate -l 2G /swapfile &amp;&amp; chmod 600 /swapfile &amp;&amp; mkswap /swapfile &amp;&amp; swapon /swapfile</code></li>
<li><strong>Skipping the firewall</strong> — Hermes WebUI on port 8080 is exposed to the internet by default. <code>ufw allow 22/tcp &amp;&amp; ufw allow 8080/tcp &amp;&amp; ufw enable</code> — and use Cloudflare Tunnel with access rules for production.</li>
<li><strong>Not pinning the Hermes version</strong> — Run <code>hermes --version</code> before updating. Once a month I clone the release tag instead of <code>main</code> to avoid breaking changes.</li>
<li><strong>Ignoring logs</strong> — <code>journalctl -u hermes -f</code> is your debug best friend. When a skill fails silently, the logs always tell you why.</li>
</ol>
<hr>
<h2 id="faq">FAQ</h2>
<p><strong>Q: Can I run Hermes on a Raspberry Pi?</strong>
<strong>A:</strong> Yes — Hermes runs on ARM64. A Pi 5 with 8GB RAM works, but expect slower skill installs. I use a Pi 4 at home for local testing before deploying skills to the VPS — for lightweight terminal-only coding tasks, <a href="/posts/oh-my-pi-quick-review-2026-06-08/">oh-my-pi</a> is actually a better fit on lower-end hardware.</p>
<p><strong>Q: Do I need Docker?</strong>
<strong>A:</strong> No. Hermes installs natively with Python + uv. Docker is optional if you want container isolation.</p>
<p><strong>Q: How do I update Hermes?</strong>
<strong>A:</strong> <code>cd /opt/hermes-agent &amp;&amp; git pull &amp;&amp; source .venv/bin/activate &amp;&amp; uv pip install -e . &amp;&amp; systemctl restart hermes</code></p>
<p><strong>Q: Can I use a different LLM provider?</strong>
<strong>A:</strong> Sure — Hermes supports DeepSeek, OpenAI, Anthropic, OpenRouter, and custom providers. So you can run multiple providers and configure which model handles which task type.</p>
<p><strong>Q: Is this production-ready for a team?</strong>
<strong>A:</strong> Absolutely — the Kanban scheduler, multi-profile isolation, and skill system are designed for multi-user setups. Each team member gets their own profile with independent memory and skills.</p>
<hr>
<p><em>Disclosure: This post contains affiliate links for DigitalOcean and Vultr. If you sign up through these links, I may earn a credit at no extra cost to you. All recommendations are based on my personal experience running Hermes Agent in production for three months.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Mnemo Review 2026: Rust AI Memory That Makes LLMs Actually Remember</title>
      <link>https://toolgenix.nxtniche.com/posts/mnemo-ai-memory-layer-rust-review/</link>
      <pubDate>Sun, 07 Jun 2026 00:00:00 +0000</pubDate>
      <guid>https://toolgenix.nxtniche.com/posts/mnemo-ai-memory-layer-rust-review/</guid>
      <description>Mnemo is a local-first AI memory layer that gives any LLM persistent knowledge graph memory. I deployed it, tested the API, and compared it against alternatives — here&amp;#39;s my honest review.</description>
      <content:encoded><![CDATA[<p>Look, LLMs are great at generating text but terrible at remembering what you told them five minutes ago. So every session starts from scratch. And you repeat your preferences, your project context, your API keys — yet the model still drifts off-topic by turn 15.</p>
<p>So most &ldquo;AI memory&rdquo; tools handle this by keeping everything in RAM or shipping your data to a cloud API. But neither scales well when you&rsquo;re running multi-session agent workflows.</p>
<p>But <strong>Mnemo</strong> takes a different approach. It&rsquo;s a sidecar service written in Rust — single static binary, persistent SQLite-backed knowledge graph, sub-5ms retrieval, zero cloud dependency. I spun up a test instance with Docker Compose, hit every API endpoint with curl, and ran through the ingestion-retrieval cycle to see how it actually performs. So here&rsquo;s what I found.</p>
<h2 id="quick-verdict">Quick Verdict</h2>
<p>So Mnemo is not a ready-to-use chatbot or a managed agent harness. But if you&rsquo;re building custom LLM pipelines and need persistent, structured, local memory that survives restarts and scales to thousands of sessions, it&rsquo;s one of the most solid options I&rsquo;ve seen at this stage. Still, the 193 GitHub stars in five days tell part of the story — the architecture and API design tell the rest.</p>
<p>But the <strong>knowledge graph</strong> layer is the real differentiator. Most tools dump raw conversation history back into your prompt and let the LLM figure out what&rsquo;s relevant. Yet Mnemo extracts entities, weights relationships, does multi-hop graph traversal, and scores results before injection. And that&rsquo;s a fundamentally better approach.</p>
<h2 id="what-is-mnemo">What Is Mnemo?</h2>
<p>So Mnemo is a <strong>local memory sidecar</strong> for LLM applications. And you run it alongside your app — on the same machine or a VPS — exposing a REST API for storing and retrieving memories.</p>
<p>But here&rsquo;s how it works: instead of stuffing your LLM prompts with flat chat history, you feed raw text to Mnemo&rsquo;s <code>/ingest</code> endpoint. And it extracts named entities and their relationships using an LLM (Ollama, OpenAI, Anthropic — your choice), builds a persistent knowledge graph in SQLite backed by <code>petgraph</code> for in-memory traversal, and when you call <code>/retrieve</code>, it returns a ranked, scored context prompt you inject directly into your system message.</p>
<p>The key features:</p>
<ul>
<li>Entities are <strong>deduplicated</strong> across sessions — same person, tool, or concept gets merged automatically</li>
<li>Relationships are <strong>weighted</strong> — frequently co-occurring entities rank higher</li>
<li>Graph <strong>expansion</strong> finds indirect connections (two hops away, at default settings)</li>
<li>Results are <strong>scored</strong> — direct matches outrank graph-inferred ones by 2×, so the signal doesn&rsquo;t drown in noise</li>
</ul>
<h2 id="how-mnemo-works-architecture-deep-dive">How Mnemo Works (Architecture Deep Dive)</h2>
<p>Mnemo ships as four Rust crates in a clean layered architecture:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Crate</th>
					<th style="text-align: center">Type</th>
					<th style="text-align: left">What It Does</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left"><code>mnemo-core</code></td>
					<td style="text-align: center">Library</td>
					<td style="text-align: left">Entity extraction, graph ops (petgraph), retrieval engine, SQLite DB layer</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>mnemo-api</code></td>
					<td style="text-align: center">Binary</td>
					<td style="text-align: left">Axum-based REST API — thin handler layer over core</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>mnemo-cli</code></td>
					<td style="text-align: center">Binary</td>
					<td style="text-align: left">CLI tool — blocking reqwest calls against the API</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>mnemo-bench</code></td>
					<td style="text-align: center">Binary</td>
					<td style="text-align: left">12 performance benchmark suites</td>
			</tr>
	</tbody>
</table>
<p>And I spent most of my time testing <code>mnemo-core</code> and <code>mnemo-api</code> because those are where the real engineering lives. The retrieval pipeline has six stages:</p>
<ol>
<li><strong>Full-text chunk search</strong> — SQLite FTS5 over stored memory chunks</li>
<li><strong>Entity name search</strong> — exact and fuzzy match on entity names</li>
<li><strong>Graph expansion</strong> — BFS traversal over the petgraph knowledge graph (configurable depth, default 2)</li>
<li><strong>Relation filter</strong> — keeps only entities connected by a relationship with weight above threshold</li>
<li><strong>Score + rank</strong> — multiplies match quality by graph distance (direct = 1.0, 1 hop = 0.7, 2 hops = 0.5)</li>
<li><strong>Assemble context prompt</strong> — returns a ready-to-inject string with the top-K results</li>
</ol>
<p>But what stood out to me during testing: the scoring math isn&rsquo;t arbitrary. Direct matches at 1.0× vs graph-expanded at 0.5× means the signal-to-noise ratio degrades gracefully as you broaden the search. And most naive context dumpers don&rsquo;t even try to rank.</p>
<h2 id="api-walkthrough--14-endpoints-i-actually-hit-with-curl">API Walkthrough — 14 Endpoints I Actually Hit With curl</h2>
<p>So I started the container, ran <code>curl http://localhost:8080/health</code> to confirm the service was alive. It returned server status, DB health, and active LLM backend config — all clean JSON. And that gave me confidence to test the full API surface.</p>
<p>Here&rsquo;s the complete endpoint map I worked through:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Method</th>
					<th style="text-align: left">Path</th>
					<th style="text-align: left">Purpose</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left"><code>GET</code></td>
					<td style="text-align: left"><code>/health</code></td>
					<td style="text-align: left">Server + DB + LLM status check</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>POST</code></td>
					<td style="text-align: left"><code>/ingest</code></td>
					<td style="text-align: left">Store text and extract entities</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>POST</code></td>
					<td style="text-align: left"><code>/retrieve</code></td>
					<td style="text-align: left">Get ranked memory context for a query</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>GET</code></td>
					<td style="text-align: left"><code>/entities</code></td>
					<td style="text-align: left">List all known entities (paginated)</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>GET</code></td>
					<td style="text-align: left"><code>/entities/:id</code></td>
					<td style="text-align: left">Get entity detail by UUID</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>DELETE</code></td>
					<td style="text-align: left"><code>/entities/:id</code></td>
					<td style="text-align: left">Delete entity (cascading)</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>GET</code></td>
					<td style="text-align: left"><code>/entities/:id/neighbors</code></td>
					<td style="text-align: left">Knowledge graph neighbors (depth max 5)</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>GET</code></td>
					<td style="text-align: left"><code>/chunks</code></td>
					<td style="text-align: left">List memory chunks (paginated)</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>POST</code></td>
					<td style="text-align: left"><code>/search</code></td>
					<td style="text-align: left">Full-text search across entities and chunks</td>
			</tr>
			<tr>
					<td style="text-align: left"><code>DELETE</code></td>
					<td style="text-align: left"><code>/wipe</code></td>
					<td style="text-align: left">Delete everything (irreversible)</td>
			</tr>
	</tbody>
</table>
<p>But honestly, the two I found most useful for real-world workflows:</p>
<p><strong><code>POST /ingest</code></strong> takes <code>content</code> (required), <code>source</code> (required — &ldquo;chat&rdquo;, &ldquo;email&rdquo;, &ldquo;cli&rdquo;), an optional <code>session_id</code>, and arbitrary <code>metadata</code> JSON. That metadata field is a small touch that makes a big difference — you can tag memories by project, priority level, or any custom taxonomy your app needs. I tested this by sending a support ticket transcript tagged with <code>&quot;priority&quot;: &quot;high&quot;</code> and saw it correctly classified in the entity graph.</p>
<p><strong><code>POST /retrieve</code></strong> takes <code>text</code>, optional <code>session_id</code> filter, <code>max_chunks</code> (default 10), <code>max_entities</code> (20), <code>min_confidence</code> (0.5), and critically — <code>include_graph</code> (default true) and <code>graph_depth</code> (default 2). So being able to turn graph expansion off when you want exact recall only is the kind of control I appreciate after having used other memory tools that force you into one mode.</p>
<h2 id="performance-that-actually-matters">Performance That Actually Matters</h2>
<p>Mnemo includes 12 benchmark suites. The README publishes results from an Apple M2 (debug build — release is 3–5× faster):</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Operation</th>
					<th style="text-align: center">Average Latency</th>
					<th style="text-align: center">Throughput</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">Entity insert (SQLite)</td>
					<td style="text-align: center">0.12 ms</td>
					<td style="text-align: center">8,300 ops/s</td>
			</tr>
			<tr>
					<td style="text-align: left">Entity lookup by ID</td>
					<td style="text-align: center">0.08 ms</td>
					<td style="text-align: center">12,500 ops/s</td>
			</tr>
			<tr>
					<td style="text-align: left">Chunk insert</td>
					<td style="text-align: center">0.14 ms</td>
					<td style="text-align: center">7,100 ops/s</td>
			</tr>
			<tr>
					<td style="text-align: left">Full-text chunk search</td>
					<td style="text-align: center">0.28 ms</td>
					<td style="text-align: center">3,500 ops/s</td>
			</tr>
			<tr>
					<td style="text-align: left">Graph neighbor (depth=1)</td>
					<td style="text-align: center">0.21 ms</td>
					<td style="text-align: center">4,700 ops/s</td>
			</tr>
			<tr>
					<td style="text-align: left">Graph neighbor (depth=2)</td>
					<td style="text-align: center">0.89 ms</td>
					<td style="text-align: center">1,100 ops/s</td>
			</tr>
			<tr>
					<td style="text-align: left"><strong>Full retrieval pipeline</strong></td>
					<td style="text-align: center"><strong>4.2 ms</strong></td>
					<td style="text-align: center"><strong>238 ops/s</strong></td>
			</tr>
	</tbody>
</table>
<p>Still, sub-millisecond graph traversal at depth 2 is impressive for a pure Rust implementation. And the full pipeline at 4.2 ms means even your most latency-sensitive LLM calls won&rsquo;t notice the memory injection step. In my testing, I found that the 4.2 ms figure is the most important number here — it tells you Mnemo can sit in the hot path of any real-time agent loop without becoming a bottleneck.</p>
<h2 id="mnemo-vs-the-alternatives">Mnemo vs. The Alternatives</h2>
<p>So I compared Mnemo against the two most common approaches to AI memory — in-memory context windows and cloud-based memory services:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Feature</th>
					<th style="text-align: center">Mnemo</th>
					<th style="text-align: center">In-Memory (Flat Context)</th>
					<th style="text-align: center">Cloud Memory Services</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">Runtime</td>
					<td style="text-align: center">Single Rust binary</td>
					<td style="text-align: center">— (lives in app memory)</td>
					<td style="text-align: center">Python daemon</td>
			</tr>
			<tr>
					<td style="text-align: left">Storage</td>
					<td style="text-align: center">SQLite (persistent)</td>
					<td style="text-align: center">RAM (lost on restart)</td>
					<td style="text-align: center">Cloud DB (vendor lock)</td>
			</tr>
			<tr>
					<td style="text-align: left">Graph layer</td>
					<td style="text-align: center">petgraph, multi-hop BFS</td>
					<td style="text-align: center">None</td>
					<td style="text-align: center">Sometimes basic</td>
			</tr>
			<tr>
					<td style="text-align: left">Entity dedup</td>
					<td style="text-align: center">✅ Auto across sessions</td>
					<td style="text-align: center">❌ Manual or none</td>
					<td style="text-align: center">✅</td>
			</tr>
			<tr>
					<td style="text-align: left">Scored ranking</td>
					<td style="text-align: center">✅ 6-stage pipeline</td>
					<td style="text-align: center">❌ Dumps everything</td>
					<td style="text-align: center">Partial</td>
			</tr>
			<tr>
					<td style="text-align: left">Cloud dependency</td>
					<td style="text-align: center">Zero</td>
					<td style="text-align: center">Zero</td>
					<td style="text-align: center">Required</td>
			</tr>
			<tr>
					<td style="text-align: left">LLM backend</td>
					<td style="text-align: center">Any OpenAI-compatible</td>
					<td style="text-align: center">Your app&rsquo;s LLM</td>
					<td style="text-align: center">Locked to provider</td>
			</tr>
			<tr>
					<td style="text-align: left">Latency</td>
					<td style="text-align: center">~4.2 ms full pipeline</td>
					<td style="text-align: center">~0 ms (pre-built)</td>
					<td style="text-align: center">50–200 ms (network)</td>
			</tr>
	</tbody>
</table>
<p>But the tradeoff is clear: Mnemo trades zero-latency (flat in-memory context) for structured, persistent, deduplicated memory. So for anything beyond a single-session chatbot, that trade is worth making. And at 4.2 ms, you barely feel the latency anyway.</p>
<h2 id="who-should-use-mnemo">Who Should Use Mnemo</h2>
<p>That said, Mnemo is <strong>not</strong> for everyone. Here&rsquo;s my honest breakdown:</p>
<p><strong>Use it if:</strong></p>
<ul>
<li>You&rsquo;re building a custom AI agent or LLM pipeline and need memory that survives restarts</li>
<li>You want structured entity extraction, not raw log dumping</li>
<li>You&rsquo;re comfortable with Docker or have Rust toolchain installed</li>
<li>You&rsquo;d rather run memory locally than pay per-token for cloud memory</li>
</ul>
<p><strong>Skip it if:</strong></p>
<ul>
<li>You use a managed agent harness (Claude Code, Cursor, etc.) — those handle memory internally</li>
<li>You need a one-command chatbot that remembers — this is a sidecar service, not an app</li>
<li>Your project is a single-session script — flat context is simpler</li>
</ul>
<p>Yet here&rsquo;s the thing — I think Mnemo pairs beautifully with self-hosted agent environments. So if you&rsquo;re running <a href="https://toolgenix.nxtniche.com/posts/2026-06-05-agent-reach-quick-look/">Agent-Reach</a> or similar tooling that gives your agents web access, adding Mnemo means they both remember what they learned and can recall it across sessions. And that&rsquo;s where this gets interesting.</p>
<h2 id="what-i-like">What I Like</h2>
<p><strong>The architecture is clean.</strong> Four crates, clear separation of concerns, Axum for the API layer. Plus, the README even explains why the scoring uses 0.5× for graph-expanded results — it&rsquo;s documented, not arbitrary.</p>
<p><strong>Configuration is flexible.</strong> Environment variables, TOML config file, or both (env vars take precedence). And the active config source is reported in <code>/health</code>. Still, that&rsquo;s a small detail — saves debugging time.</p>
<p><strong>The Python SDK is a nice bonus.</strong> Not everyone writes Rust. So the <code>mnemo-sdk</code> pip package with both sync and <code>AsyncMnemoClient</code> means Python-based agent frameworks can plug in without wrapping the REST API manually.</p>
<p><strong>122 Rust tests + 21 Python tests + 12 benchmarks.</strong> For a project that&rsquo;s been public for 5 days, that&rsquo;s a strong signal the author cares about correctness.</p>
<h2 id="what-could-be-better">What Could Be Better</h2>
<p><strong>No pre-built release binaries yet.</strong> You have to compile from source or use Docker. For a Rust binary that promises &ldquo;single static binary deployment,&rdquo; shipping pre-built binaries for Linux x86_64 and ARM64 would cut the setup friction in half. Still, Docker is the smoothest path right now — I had it running in about three minutes.</p>
<p><strong>Entity extraction quality depends entirely on your LLM model.</strong> Mnemo doesn&rsquo;t do its own NER — it delegates entity extraction to whatever LLM you configure. So feed it a weak model and you&rsquo;ll get weak entities. In short, the system is only as smart as the LLM behind it.</p>
<p><strong>The project is 5 days old.</strong> 193 stars is legit for a week-old Rust project, but there&rsquo;s no community, no plugin ecosystem, no mature documentation beyond the README and a handful of markdown docs. Still, you&rsquo;re an early adopter — and that comes with tradeoffs.</p>
<p>But my take after using it: none of these are dealbreakers for the right use case.</p>
<h2 id="self-hosted-mnemo-deployment">Self-Hosted Mnemo Deployment</h2>
<p>So if you want Mnemo running 24/7 as a memory backend for your agents, you&rsquo;ll deploy it on a VPS. Here&rsquo;s the setup I used:</p>
<ol>
<li>Spin up a Linux VM (the cheapest tier on any cloud provider works — 1 vCPU, 1 GB RAM is plenty for the Mnemo binary itself; you&rsquo;ll want more if you run Ollama on the same machine)</li>
<li>Install Docker (or compile from source)</li>
<li>Run <code>docker compose up -d</code> from the cloned repo</li>
<li>Optionally add Ollama on the same machine for fully local entity extraction</li>
</ol>
<!-- BEGIN AFFILIATE LINKS (generated by ads-center for ToolGenix) -->
<p><em>Disclosure: Some of the links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.</em></p>
<p>To deploy Mnemo 24/7, you'll need a VPS. I recommend <strong>DigitalOcean</strong> — new users get <strong>$200 in free credit</strong> (valid for 60 days), which is more than enough to run Mnemo for months. The $6/month basic Droplet handles Mnemo + Ollama without breaking a sweat:</p>
<p><a href="https://toolgenix.nxtniche.com/go/do" rel="nofollow sponsored" target="_blank">→ DigitalOcean: Get $200 Free Credit</a></p>
<p>Prefer a provider with more global regions or better Asia-Pacific coverage? <strong>Vultr</strong> offers datacenters worldwide and new accounts receive <strong>$50–100 in credit</strong>. Their $6/month cloud instances are equally suitable:</p>
<p><a href="https://toolgenix.nxtniche.com/go/vultr" rel="nofollow sponsored" target="_blank">→ Vultr: Start with Free Credit</a></p>
<!-- END AFFILIATE LINKS -->
<p>So for the VPS, I&rsquo;d recommend <strong>DigitalOcean</strong> or <strong>Vultr</strong> — both offer $6–12/month droplets/instances that handle this workload easily. And if you need GPU instances for running larger LLM extraction models locally, <strong>AWS</strong> has spot GPU instances that work well for batch processing.</p>
<!-- BEGIN AFFILIATE LINKS (generated by ads-center for ToolGenix) -->
<p><em>Disclosure: Some of the links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.</em></p>
<p>If you prefer to run LLM extraction on your own hardware rather than renting cloud GPU instances, a dedicated GPU is the way to go. The <strong>NVIDIA GeForce RTX 4090</strong> is currently one of the best consumer cards for local LLM inference — 24 GB VRAM handles models up to ~13B parameters comfortably:</p>
<p><a href="https://toolgenix.nxtniche.com/go/amazon/B0CHZG4B5X" rel="nofollow sponsored" target="_blank">→ NVIDIA RTX 4090 on Amazon (check current price)</a></p>
<p>For a more budget-friendly option, the <strong>RTX 4070 Super</strong> (12 GB VRAM) works well for 7B-parameter models:</p>
<p><a href="https://toolgenix.nxtniche.com/go/amazon/B0CZ9D4TKK" rel="nofollow sponsored" target="_blank">→ NVIDIA RTX 4070 Super on Amazon</a></p>
<!-- END AFFILIATE LINKS -->
<p>The Docker Compose setup is the easiest path: the repo includes a <code>docker-compose.yml</code> that wires Mnemo to a bundled Ollama instance. One command gets you a fully local, persistent AI memory layer.</p>
<h2 id="final-verdict">Final Verdict</h2>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Dimension</th>
					<th style="text-align: center">Rating</th>
					<th style="text-align: left">Notes</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">Architecture</td>
					<td style="text-align: center">⭐⭐⭐⭐½</td>
					<td style="text-align: left">Clean crate layering, petgraph-based graph engine, 6-stage retrieval pipeline</td>
			</tr>
			<tr>
					<td style="text-align: left">Performance</td>
					<td style="text-align: center">⭐⭐⭐⭐⭐</td>
					<td style="text-align: left">4.2 ms full pipeline on M2, sub-millisecond graph ops</td>
			</tr>
			<tr>
					<td style="text-align: left">Ease of use</td>
					<td style="text-align: center">⭐⭐⭐</td>
					<td style="text-align: left">Docker is easy; no pre-built binaries yet</td>
			</tr>
			<tr>
					<td style="text-align: left">Documentation</td>
					<td style="text-align: center">⭐⭐⭐⭐</td>
					<td style="text-align: left">README is thorough, API docs are clear, could use more deployment guides</td>
			</tr>
			<tr>
					<td style="text-align: left">Maturity</td>
					<td style="text-align: center">⭐⭐⭐</td>
					<td style="text-align: left">5 days old, solid foundations but early</td>
			</tr>
			<tr>
					<td style="text-align: left">Value</td>
					<td style="text-align: center">⭐⭐⭐⭐½</td>
					<td style="text-align: left">Free + MIT + zero cloud dependency = hard to beat</td>
			</tr>
	</tbody>
</table>
<p>So Mnemo solves a real problem — LLM memory — with genuinely good architecture. It&rsquo;s not a mass-market product. Still, it&rsquo;s a developer tool written in Rust, designed to be self-hosted and fully controlled.</p>
<p>And if you&rsquo;re building custom LLM pipelines and you&rsquo;ve been hacking together flat context dumps or paying for cloud memory APIs, give Mnemo a look. The knowledge graph approach to memory is the direction the space needs to go. At 193 stars and climbing, I suspect I&rsquo;m not the only one who thinks so.</p>
<p><em>Disclosure: Some links in this article are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.</em></p>
]]></content:encoded>
    </item>
  </channel>
</rss>
