Ever watched your coding agent drive a web browser, edit code, and even handle desktop UIs — then ask it to test something on an iOS Simulator and watch it shrug? Yeah, me too. browser-use (101K★) gave agents web eyes. UI-TARS gave them desktop hands.

But mobile simulators? That gap just got closed by sim-use — a 426★ cross-platform Swift CLI that lets AI agents observe and act on iOS Simulator and Android emulator screens through the accessibility tree. No vision models, no coordinates, no GUI.

What Is sim-use and Why It’s Different

So sim-use is a single Swift binary from lycorp-jp that turns any mobile simulator screen into a structured, token-efficient text outline an LLM can reason about — then tap elements by alias. Think of it as browser-use for mobile — and here’s the twist: it doesn’t use screenshots or vision processing. It walks the native accessibility API and outputs a compact element tree with stable @N aliases.

So here’s how it stacks up against the ecosystem:

Capabilitysim-usebrowser-useUI-TARS
PlatformiOS Sim + Android emu/deviceWeb browsersDesktop apps
Input typeAccessibility tree → textScreenshot + visionScreenshot + vision
Token cost per screen~200 tokensVision-heavy (1K+)Vision-heavy (1K+)
Interactiontap @N by aliasagent.tab()Desktop agent loop
Installbrew install sim-usePython pipPython pip
Agent integrationWebSocket listen modeProgrammatic APIAgent framework

And that 16× token efficiency isn’t just a nice stat — it means your agent can observe, act, and verify a UI change in about 300ms per round trip, all without burning through your LLM context window on screenshots.

Quick Start — I Tried sim-use on an iOS 18 Simulator

And install? Dead simple:

brew tap lycorp-jp/tap
brew install lycorp-jp/tap/sim-use

And that took about 30 seconds. No Python environment, no dependency hell. So I booted an iPhone 16 Pro Simulator in Xcode, opened the Settings app, and ran:

sim-use ui

But what came back surprised me. Not a pixel — a clean, readable outline:

App: Settings  402x874

[Top  y<120]
  @1  StaticText  "Settings"
[Content  y=120..754]
  @5  SearchField  "Search"
  @7  Button  "Sign in to your iPhone"
  @9  Button  "General"
  @10 Button  "Display & Brightness"
[Bottom  y>754]
  @43 TabBar

So then I tapped “General” by alias:

sim-use tap @9
# ✓ Tap at (201.0, 452.0) completed successfully

And that was it. The screen updated — so I ran sim-use ui again and got the General settings tree. And the whole observe→act→verify cycle took under a second.

I found the --json output especially useful — it returns structured envelopes that a coding agent can parse directly instead of guessing at plain text. And the batch command lets you chain multiple steps into one invocation, reusing the HID session across them. For a Settings navigation flow like “tap Search → type ‘Wi-Fi’ → tap result”, that’s one sim-use ios batch --step call instead of three separate round trips.

Getting Your AI Agent to Use sim-use

But the killer feature is sim-use listen — a WebSocket mode that exposes the full command surface to any AI client. So sim-use init --client claude installs a bundled agent skill that teaches Claude Code the entire API. And sim-use init --client cursor works too if you’re on Cursor.

But here’s what I actually did — I wired it to a headless agent running on a VPS. So I spun up a cloud instance, installed sim-use there, and used the WebSocket mode to pipe the simulator output back to my dev machine. That way my agent can run sim-use 24/7 from the cloud without tying up my local machine.

If you want to try this yourself, DigitalOcean gives you $200 in free credit to spin up a cloud instance — more than enough to run sim-use and a headless agent for months. Or start with Vultr’s $100 trial if you prefer their global datacenter footprint. Both work great for running the WebSocket listener 24/7, and the setup takes about 5 minutes. (affiliate link)

And if you’re exploring AI agent control for different platforms, I’d check out LobsterAI for desktop automation and Mirage for extending agent capabilities through virtual filesystems.

sim-use Limitations: What to Watch Out For

Still, honest take — sim-use isn’t perfect. A few things I bumped into:

  • macOS 14+ required. It’s a Swift package with XCFrameworks from Facebook’s idb. No Linux builds. If you’re on an older Mac or want to run this from a Linux CI runner, you’re out of luck.
  • No visual verification. sim-use works through the accessibility tree. It can tell you a button exists and is tappable. It can’t tell you the button is the wrong shade of blue or that the font is off by 2px. For pixel-level UI testing, you still need snapshot tools.
  • Agent needs to understand AX trees. The accessibility-tree output is powerful, but your agent needs some context to interpret it. The bundled agent skill helps a lot, but I had to tweak my prompt a few times before the agent reliably picked the right @N alias.
  • 426★ / 6 days old. The project is very new. The code quality looks solid (Apache-2.0, well-structured Swift), but the community is small and documentation beyond the README is thin.

Bottom Line on sim-use

So sim-use fills the one gap browser-use and UI-TARS left open — mobile simulator control. If you’re building AI agents that need to test iOS or Android apps, or if you’re a mobile developer who wants an AI pair tester that can actually press buttons and read screens, this is the tool. The token-efficient accessibility-tree approach is genuinely smarter than vision-based alternatives for most interaction scenarios.

Disclosure: Some links below are affiliate links. If you sign up through them, I may earn a commission at no extra cost to you.