
Cua
The open-source Docker container for computer-use AI agents — a cloud desktop for every agent, in one command.
Thesis
- 01
Computer-use is the next API surface. Anthropic shipped computer use with Claude 3.5 Sonnet in October 2024 — 14.9% on OSWorld, nearly 2× the next-best model.[7] OpenAI followed with Operator in January 2025; Google with Gemini 2.5 Computer Use in October 2025.[4] [19] By February 2026 the frontier had crossed OSWorld's 72.36% human baseline.[16] [17] GUIs are the universal interface: the agent that controls a desktop can do any work a human can. Browser-only is a subset; the full stack is the OS.
- 02
Docker is the right metaphor — and the right wedge. A single command spins up a sandboxed desktop. Containerization unlocks fan-out, reproducibility, and security for agent workloads. Browserbase did this for browsers and built a real business on top; Cua extends the same primitive to macOS, Linux, Windows, and Android.[5] [2]
- 03
OSS wins this category by default. Builders need to inspect, fork, and self-host computer-use infrastructure — the security surface is too sensitive to outsource to a closed binary. MIT-licensed, 17.9k GitHub stars and growing, native integrations with Claude Code, Cursor, Codex, plus OpenAI, Anthropic, Ollama, and OpenRouter as model providers.[1] The OSS core is the distribution; the managed cloud is the renewal.
- 04
The founder is the infra builder. Francesco co-created Windows Agent Arena at Microsoft AI — the canonical benchmark for OS-level computer-use agents — before starting Cua.[3] The person who designed the eval is now building the runtime everyone else has to pass it on. That sequence — research the problem, define the benchmark, ship the production primitive — is structurally hard to imitate.
Problem
Every computer-use agent has the same first problem: where does it run?
Letting the agent loose on the user's host is a security non-starter — the LLM hallucinates a rm -rf, exfiltrates a credential, or installs a payload. Spinning up a desktop VM is the obvious answer, but every team building one ends up rebuilding the same six layers: Apple Virtualization Framework wiring, screen capture, input simulation, container orchestration, isolation policy, and a hot-start runtime that boots fast enough for a chat-speed agent loop.[1]
Browser agents got a head start because the browser is already a sandbox. Browser Use commoditized the agent framework on top — 98k+ stars and a $17M Felicis-led seed.[13] [21] Browserbase raised a $40M Series B at a $300M valuation to build managed browser infrastructure underneath.[22] But the browser is a subset — Anthropic and OpenAI both shipped computer-use APIs at the OS level precisely because the bulk of real work happens outside the tab.[7] [4]
There is no Browserbase for the desktop. Every team trying to ship a CUA-class product is paying the infrastructure tax in-house — and the tax is high enough that some give up and ship browser-only. Cua is the missing primitive: one command, MIT-licensed Linux, macOS, Windows, or Android container, exposed through a computer-use interface any LLM can drive, running at 97% native CPU on Apple Silicon.[1] [2]
72.7%
Claude Opus 4.6 on OSWorld
Feb 2026 — above the 72.36% human baseline[16]
5×
OSWorld SOTA gain in 16 months
14.9% (Oct 2024) → 72.7% (Feb 2026) — model risk retired[7][16]
$22B → $90B+
RPA market 2024 → 2030
The legacy automation budget computer-use replaces[6]
The model layer is past proof-of-concept — past the human baseline, in fact. The runtime layer is the open question — and the one Cua exists to answer.
Why Now
The agent stack split into model, framework, and infrastructure layers — inside a single year.
Anthropic and OpenAI validated the demand curve at the model layer. Browser Use and Browserbase proved the framework /infrastructure split works. Cua is the OS-level counterpart to Browserbase — the runtime under everything else.
Developers can direct Claude to use computers the way people do — by looking at a screen, moving a cursor, clicking buttons, and typing text.
Anthropic[7]
Claude 3.5 Sonnet launch · Oct 2024
Operator can fill forms, place online orders, schedule appointments, and complete other repetitive tasks — a first step toward AI that acts on your behalf across the digital world.
OpenAI[4]
Operator launch · Jan 2025
The agent stack is splitting into model, framework, and infrastructure layers — the value will compound at the layer that handles the runtime, not the one that handles the prompt.
Browser agent market map[11]
Theta Labs · 2025
Three preconditions converged in eighteen months.
All three frontier labs now ship computer-use models. Anthropic opened the category with Claude 3.5 Sonnet in October 2024 — 14.9% on OSWorld, nearly 2× the next-best system, with Replit, Canva, Asana, DoorDash, Cognition, and The Browser Company on launch day.[7] OpenAI followed with Operator in January 2025, then folded it into ChatGPT agent in mid-2025.[4] [20] Google joined in October 2025 with the Gemini 2.5 Computer Use model.[19] Three labs, one shared assumption: the customer brings the desktop.
The models crossed the human baseline. Claude Sonnet 4.5 hit 61.4% on OSWorld in September 2025 — up from Sonnet 4's 42.2% just four months earlier.[15] Simular's Agent S crossed OSWorld's 72.36% human baseline in December 2025 at 72.6%; Claude Opus 4.6 followed at 72.7% in February 2026.[17] [16] Model capability is no longer the gating risk. The bottleneck moved to the runtime — boot time, snapshot/restore, isolation, fleet orchestration — exactly the surface Cua engineers against.[1]
Capital marked the adjacent layers — and left this one open. In eleven months, investors priced every neighboring shelf of the agent-sandbox stack: Browser Use's $17M seed, then Browserbase's $40M Series B at $300M, E2B's $21M Series A with 88% of the Fortune 100 on the platform, and Daytona's $24M Series A.[21] [22] [23] [24] Browser sandboxes and code sandboxes are now funded categories. The OS-level desktop runtime — the layer the computer-use models actually assume — is the one seat still unpriced, and Cua holds the OSS default with 17.9k stars and 50k+ engineers.[1] [2]
Computer-use agents crossed the human baseline
Chart
Best published OSWorld success rate at each release. From 14.9% to past the 72.36% human baseline in sixteen months — the model layer stopped being the bottleneck.[7] [18] [15] [17] [16]
Source · Anthropic, OpenAI, Simular published OSWorld results · Oct 2024 – Feb 2026 [7][15][16][17][18]
Developers can direct Claude to use computers the way people do — by looking at a screen, moving a cursor, clicking buttons, and typing text.
How It Works
Three layers. One container per agent. Boot to action in under a second.

The 2026 surface: from one container to a product line.
Cua Driver — background computer-use. Agents drive native apps on macOS and Windows without stealing the cursor, focus, or Space — including non-AX surfaces like Chromium web content and canvas tools (Blender, Figma, DAWs, game engines). Same CLI and MCP server from Claude Code, Cursor, Codex, and custom clients; every session recorded as a replayable trajectory.[1] [26] This is the feature that turns computer-use from a demo you watch into a coworker that runs while you keep typing.
One SDK, every OS, bring your own image. Sandbox.ephemeral(Image.linux()) — or .macos(), .windows(), .android() — against the cloud or local QEMU, with custom .qcow2/.iso images self-hosted today and in the cloud next.[1] The uniform API across six runtimes is the contract competitors with one substrate cannot offer.
Cua-Bench — the eval and RL-environment loop. Run agents against OSWorld, ScreenSpot, Windows Arena, and custom tasks; export trajectories for training.[1] [26] As labs buy RL environments for computer-use post-training, the benchmark registry is a second revenue surface — and it is the founder's home turf, given Windows Agent Arena.[3]
Self-hosted is the funnel. The managed cloud is where the workload ships.
Hot-start under one second. The managed cloud snapshots a warm desktop image and restores it for every new agent session. The cost difference between a cold-boot VM and a hot-start image is roughly 60× — the difference between a runtime you can spin up per chat turn and one you can't.[2]
Cross-OS fleet orchestration. macOS, Linux, Windows, and Android containers from one control plane. Windows desktops in particular are exclusive to the cloud — Apple Silicon licensing makes self-hosting Windows impractical for most teams, which makes managed the only path for the workloads that actually require it (legacy ERP, .NET, native enterprise tooling).[2]
Observability, recording, and replay. Every agent action recorded as a video plus structured trace. The artifact stack is what turns an agent prototype into a production workload — eval harnesses, regression testing, incident debugging. The OSS gives you a container; the cloud gives you a system.[2]
Docker for Computer-Use
The OS-level agent stack is at its pre-Docker moment.
The pattern is exact. Before Docker, every team rebuilt the same chroot plus init system plus image layer plus networking stack. After Docker, none of them did — and the value moved up to orchestration, registries, and managed clouds. Computer-use is at the same moment now.
The container is the wedge. The fleet is the business.
The runtime stops being a moat the moment it becomes a standard. Docker the company didn't capture the orchestration value — Kubernetes, registries, and the hyperscalers did. Cua's thesis is to be the OSS standard at the runtime layer and the first mover at the orchestration layer. Browserbase ran the same play in browsers and built a real business on it.[5]
The operational memory compounds. Every agent run leaves a trace inside the container — what worked, what failed, which UI surfaces broke, which recovery strategies converged. That dataset is the natural input for RL training, regression evals, and reliability improvements. Cua-Bench is already in the repo for exactly this loop.[1]
The container is the API contract that survives model churn. Frontier models cycle every six months. The OS surface doesn't. A team that builds against Cua's computer-use interface today will run the same code against whichever model is best in 18 months. Abstraction over substrate is the durable position.
How can AI agents interact with operating systems, desktop applications, and browsers without jeopardizing security or sacrificing performance?
Market
The runtime layer is structurally larger than the framework layer.
Near-term ICP is every team shipping a computer-use agent: foundation labs running evals (HUD, on the customer list), legacy automation startups (Fira), academic research, YC AI cohort companies, and the 50k+ engineers already building on Cua.[2] The buyer is a technical founder writing a research preview, or a Series-A team scaling fleet ops — both want OSS by default and managed when production demands it.
Longer-term, the category is agent infrastructure as a line item. RPA is a ~$22B (2024) market growing toward $90B+ by 2030, driven by enterprise digitization.[6] The CUA-class agent stack is the AI-native successor — the segment where workflows RPA can't reach (legacy ERP, design tools, CAD, native apps with no API) finally become automatable. Browserbase has proven a real business sits under browser agents; the OS-level fleet is structurally a larger surface.
Cua OSS adoption — GitHub stars since launch
Chart
Zero to 17,861 stars in sixteen months, with no paid distribution. Growth re-accelerated in 2026 as Cua Driver and CuaBot shipped — roughly 6k stars added since January.[1]
Source · GitHub stargazer timestamps, trycua/cua, sampled via GitHub API · Jun 12, 2026 [1]
Every team building a computer-use agent has to solve the same desktop problem. Cua should be the answer by default — and that's how the next generation of agent infrastructure gets written.
Competitive landscape
Four neighbors. None of them ship the OSS desktop container.
The frontier labs are upstream. Browser infra is adjacent. Dev sandboxes are a different shape. Agent frameworks are downstream consumers. Cua's position — OSS desktop runtime, multi-OS, multi-LLM — has no direct equivalent.
Capital priced every adjacent layer in eleven months
Chart
Disclosed rounds across the agent-sandbox stack, Mar 2025 – Feb 2026: Browser Use (framework), Browserbase (browser infra, $300M valuation), E2B and Daytona (code sandboxes). The OS-level desktop runtime is the seat still unpriced.[21] [22] [23] [24]
Source · Company and press announcements · 2025–2026 [21][22][23][24]
The model layer is shipping the brains. Someone has to ship the body. Cua is the open-source default for the runtime — and the OSS default usually wins infrastructure.
Founder deep dive
The person who wrote the benchmark is now writing the runtime.
Founder & team
Risks & mitigations
What we're watching
References
- [1]GitHub — trycua/cua (MIT, 17.9k stars, 1.1k forks · Jun 2026)
- [2]Cua — Product homepage (50k+ engineers, <1s hot-start, multi-OS)
- [3]Windows Agent Arena — Evaluating multi-modal OS agents at scale (arXiv 2409.08264, Francesco Bonacci et al.)
- [4]OpenAI — Introducing Operator (Jan 23, 2025)
- [5]Browserbase — Cloud browsers for AI agents (~$40M raised, infra layer reference)
- [6]Grand View Research — RPA market $22B (2024) → $90B+ by 2030
- [7]Anthropic — Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku (Oct 22, 2024)
- [8]Y Combinator — Cua launch: Docker container for computer-use agents
- [9]Y Combinator — Cua company profile (P25, Diana Hu)
- [10]Model Context Protocol — Open standard for connecting AI assistants to tools (Anthropic, Nov 2024)
- [11]Theta Labs — Browser agent market map (X, 2025)
- [12]OSWorld — Benchmarking multimodal agents for open-ended tasks on real computer environments
- [13]Browser Use — Open-source agent framework for web (98k+ stars · Jun 2026)
- [14]Cua Discord — 600+ developer community
- [15]Anthropic — Introducing Claude Sonnet 4.5: 61.4% on OSWorld, up from Sonnet 4's 42.2% four months earlier (Sep 29, 2025)
- [16]Anthropic — Claude Opus 4.6: best computer-using model, 72.7% on OSWorld (Feb 2026)
- [17]Simular — Agent S crosses OSWorld's 72.36% human baseline at 72.6% (Dec 16, 2025)
- [18]OpenAI — Computer-Using Agent (CUA): 38.1% on OSWorld at launch (Jan 2025)
- [19]Google DeepMind — Introducing the Gemini 2.5 Computer Use model (Oct 2025)
- [20]OpenAI — Introducing ChatGPT agent (Jul 2025; Operator folded in and retired Aug 2025)
- [21]TechCrunch — Browser Use raises $17M seed led by Felicis (Mar 23, 2025)
- [22]Built In SF — Browserbase raises $40M Series B at a $300M valuation, led by Notable Capital (Jun 2025)
- [23]E2B — $21M Series A led by Insight Partners; 88% of the Fortune 100 on the platform (Jul 2025)
- [24]PR Newswire — Daytona raises $24M Series A led by FirstMark, with Datadog and Figma Ventures (Feb 5, 2026)
- [25]Launch HN — Cua (YC X25): open-source Docker container for computer-use agents (Apr 2025)
- [26]Cua Docs — Cua Driver, CuaBot, Cua-Bench, and the Sandbox SDK (cua.ai/docs)


