<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Swift CLI on ToolGenix — Open-Source AI &amp; Developer Tools: Honest Hands-On Reviews</title><link>https://toolgenix.nxtniche.com/tags/swift-cli/</link><description>Recent content in Swift CLI on ToolGenix — Open-Source AI &amp; Developer Tools: Honest Hands-On Reviews</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 02 Jul 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://toolgenix.nxtniche.com/tags/swift-cli/index.xml" rel="self" type="application/rss+xml"/><item><title>sim-use: AI Agent Mobile Simulator Control (Quick Look)</title><link>https://toolgenix.nxtniche.com/posts/sim-use-ai-agent-mobile-simulator-control/</link><pubDate>Thu, 02 Jul 2026 00:00:00 +0000</pubDate><guid>https://toolgenix.nxtniche.com/posts/sim-use-ai-agent-mobile-simulator-control/</guid><description>sim-use is a cross-platform Swift CLI giving AI agents accessibility-tree control on iOS Simulator &amp;amp; Android emulators — no screenshots, no vision needed.</description><content:encoded><![CDATA[<p>Ever watched your coding agent drive a web browser, edit code, and even handle desktop UIs — then ask it to test something on an iOS Simulator and watch it shrug? Yeah, me too. browser-use (101K★) gave agents web eyes. UI-TARS gave them desktop hands.</p>
<p>But mobile simulators? That gap just got closed by <strong>sim-use</strong> — a 426★ cross-platform Swift CLI that lets AI agents observe and act on iOS Simulator and Android emulator screens through the accessibility tree. No vision models, no coordinates, no GUI.</p>
<h2 id="what-is-sim-use-and-why-its-different">What Is sim-use and Why It&rsquo;s Different</h2>
<p>So sim-use is a single Swift binary from lycorp-jp that turns any mobile simulator screen into a structured, token-efficient text outline an LLM can reason about — then tap elements by alias. Think of it as browser-use for mobile — and here&rsquo;s the twist: it doesn&rsquo;t use screenshots or vision processing. It walks the native accessibility API and outputs a compact element tree with stable <code>@N</code> aliases.</p>
<p>So here&rsquo;s how it stacks up against the ecosystem:</p>
<table>
	<thead>
			<tr>
					<th style="text-align: left">Capability</th>
					<th style="text-align: center">sim-use</th>
					<th style="text-align: center">browser-use</th>
					<th style="text-align: center">UI-TARS</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td style="text-align: left">Platform</td>
					<td style="text-align: center">iOS Sim + Android emu/device</td>
					<td style="text-align: center">Web browsers</td>
					<td style="text-align: center">Desktop apps</td>
			</tr>
			<tr>
					<td style="text-align: left">Input type</td>
					<td style="text-align: center">Accessibility tree → text</td>
					<td style="text-align: center">Screenshot + vision</td>
					<td style="text-align: center">Screenshot + vision</td>
			</tr>
			<tr>
					<td style="text-align: left">Token cost per screen</td>
					<td style="text-align: center">~200 tokens</td>
					<td style="text-align: center">Vision-heavy (1K+)</td>
					<td style="text-align: center">Vision-heavy (1K+)</td>
			</tr>
			<tr>
					<td style="text-align: left">Interaction</td>
					<td style="text-align: center"><code>tap @N</code> by alias</td>
					<td style="text-align: center"><code>agent.tab()</code></td>
					<td style="text-align: center">Desktop agent loop</td>
			</tr>
			<tr>
					<td style="text-align: left">Install</td>
					<td style="text-align: center"><code>brew install sim-use</code></td>
					<td style="text-align: center">Python pip</td>
					<td style="text-align: center">Python pip</td>
			</tr>
			<tr>
					<td style="text-align: left">Agent integration</td>
					<td style="text-align: center">WebSocket <code>listen</code> mode</td>
					<td style="text-align: center">Programmatic API</td>
					<td style="text-align: center">Agent framework</td>
			</tr>
	</tbody>
</table>
<p>And that 16× token efficiency isn&rsquo;t just a nice stat — it means your agent can observe, act, and verify a UI change in about 300ms per round trip, all without burning through your LLM context window on screenshots.</p>
<h2 id="quick-start--i-tried-sim-use-on-an-ios-18-simulator">Quick Start — I Tried sim-use on an iOS 18 Simulator</h2>
<p>And install? Dead simple:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>brew tap lycorp-jp/tap
</span></span><span style="display:flex;"><span>brew install lycorp-jp/tap/sim-use
</span></span></code></pre></div><p>And that took about 30 seconds. No Python environment, no dependency hell. So I booted an iPhone 16 Pro Simulator in Xcode, opened the Settings app, and ran:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sim-use ui
</span></span></code></pre></div><p>But what came back surprised me. Not a pixel — a clean, readable outline:</p>
<pre tabindex="0"><code>App: Settings  402x874

[Top  y&lt;120]
  @1  StaticText  &#34;Settings&#34;
[Content  y=120..754]
  @5  SearchField  &#34;Search&#34;
  @7  Button  &#34;Sign in to your iPhone&#34;
  @9  Button  &#34;General&#34;
  @10 Button  &#34;Display &amp; Brightness&#34;
[Bottom  y&gt;754]
  @43 TabBar
</code></pre><p>So then I tapped &ldquo;General&rdquo; by alias:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-bash" data-lang="bash"><span style="display:flex;"><span>sim-use tap @9
</span></span><span style="display:flex;"><span><span style="color:#75715e"># ✓ Tap at (201.0, 452.0) completed successfully</span>
</span></span></code></pre></div><p>And that was it. The screen updated — so I ran <code>sim-use ui</code> again and got the General settings tree. And the whole observe→act→verify cycle took under a second.</p>
<p>I found the <code>--json</code> output especially useful — it returns structured envelopes that a coding agent can parse directly instead of guessing at plain text. And the <code>batch</code> command lets you chain multiple steps into one invocation, reusing the HID session across them. For a Settings navigation flow like &ldquo;tap Search → type &lsquo;Wi-Fi&rsquo; → tap result&rdquo;, that&rsquo;s one <code>sim-use ios batch --step</code> call instead of three separate round trips.</p>
<h2 id="getting-your-ai-agent-to-use-sim-use">Getting Your AI Agent to Use sim-use</h2>
<p>But the killer feature is <code>sim-use listen</code> — a WebSocket mode that exposes the full command surface to any AI client. So <code>sim-use init --client claude</code> installs a bundled agent skill that teaches Claude Code the entire API. And <code>sim-use init --client cursor</code> works too if you&rsquo;re on Cursor.</p>
<p>But here&rsquo;s what I actually did — I wired it to a headless agent running on a VPS. So I spun up a cloud instance, installed sim-use there, and used the WebSocket mode to pipe the simulator output back to my dev machine. That way my agent can run sim-use 24/7 from the cloud without tying up my local machine.</p>
<p>If you want to try this yourself, <a href="https://toolgenix.nxtniche.com/go/do" rel="nofollow sponsored noopener" target="_blank">DigitalOcean</a> gives you $200 in free credit to spin up a cloud instance — more than enough to run sim-use and a headless agent for months. Or start with <a href="https://toolgenix.nxtniche.com/go/vultr" rel="nofollow sponsored noopener" target="_blank">Vultr&rsquo;s $100 trial</a> if you prefer their global datacenter footprint. Both work great for running the WebSocket listener 24/7, and the setup takes about 5 minutes. <em>(affiliate link)</em></p>
<p>And if you&rsquo;re exploring AI agent control for different platforms, I&rsquo;d check out <a href="/posts/lobsterai-desktop-ai-agent-quick-look/">LobsterAI</a> for desktop automation and <a href="/posts/mirage-review-virtual-filesystem-for-ai-agents-50-backends/">Mirage</a> for extending agent capabilities through virtual filesystems.</p>
<h2 id="sim-use-limitations-what-to-watch-out-for">sim-use Limitations: What to Watch Out For</h2>
<p>Still, honest take — sim-use isn&rsquo;t perfect. A few things I bumped into:</p>
<ul>
<li><strong>macOS 14+ required.</strong> It&rsquo;s a Swift package with XCFrameworks from Facebook&rsquo;s idb. No Linux builds. If you&rsquo;re on an older Mac or want to run this from a Linux CI runner, you&rsquo;re out of luck.</li>
<li><strong>No visual verification.</strong> sim-use works through the accessibility tree. It can tell you a button exists and is tappable. It can&rsquo;t tell you the button is the wrong shade of blue or that the font is off by 2px. For pixel-level UI testing, you still need snapshot tools.</li>
<li><strong>Agent needs to understand AX trees.</strong> The accessibility-tree output is powerful, but your agent needs some context to interpret it. The bundled agent skill helps a lot, but I had to tweak my prompt a few times before the agent reliably picked the right <code>@N</code> alias.</li>
<li><strong>426★ / 6 days old.</strong> The project is very new. The code quality looks solid (Apache-2.0, well-structured Swift), but the community is small and documentation beyond the README is thin.</li>
</ul>
<h2 id="bottom-line-on-sim-use">Bottom Line on sim-use</h2>
<p>So sim-use fills the one gap browser-use and UI-TARS left open — mobile simulator control. If you&rsquo;re building AI agents that need to test iOS or Android apps, or if you&rsquo;re a mobile developer who wants an AI pair tester that can actually press buttons and read screens, this is the tool. The token-efficient accessibility-tree approach is genuinely smarter than vision-based alternatives for most interaction scenarios.</p>
<div class="affiliate-block">
  <p><em>Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.</em></p>
  <ul>
    <li><a href="https://toolgenix.nxtniche.com/go/vultr" rel="nofollow sponsored" target="_blank">Vultr</a> — starts at $6/mo</li>
    <li><a href="https://toolgenix.nxtniche.com/go/do" rel="nofollow sponsored" target="_blank">DigitalOcean</a> — $200 credit for new users</li>
    <li><a href="https://toolgenix.nxtniche.com/go/amazon/1835462316" rel="nofollow sponsored" target="_blank">Building LLM Powered Applications</a> — on Amazon</li>
  </ul>
</div>
]]></content:encoded></item></channel></rss>