
Halluminate
The RL environment and data factory for computer-use AI.
Thesis
- 01
Capability gains keep coming from post-training, not bigger pre-training runs. Returns from scaling parameters and tokens are flattening. The recent agentic capability jumps — o-series reasoning, ChatGPT Agent, Claude Computer Use — were unlocked by reinforcement learning against purpose-built environments, not by another order of magnitude on the base model.[3] [4] The receipts are on the benchmark: OSWorld success went from 12% to 84% in two years, crossing the human baseline in 2026 — almost entirely on RL post-training.[23] [28]
- 02
Every lab is short on environments. Every lab is building computer-use agents; every lab is short on environments to train and grade them on. Live web training rate-limits, breaks when sites change, and resets state in ways that destroy training signal.[1] Halluminate ships the missing piece as its product, not as a side project some lab team will get to next quarter.
- 03
Contracts fund the catalog; the catalog is the company. Labs buy tasks and verifiers; the simulators underneath amortize across every subsequent contract. Every agent run produces a labeled trajectory; every environment seeds the next; every checker doubles as a reward signal. The defensible asset is the catalog — reusable modules, verifiable tasks, and the trajectory corpus nobody can buy off the shelf.
- 04
The customer is every frontier lab and every serious agent company. Hyperscalers spent $344B on AI in 2025 alone.[5] Anthropic's leadership has discussed putting more than $1B into RL environments over a single year.[16] The same dollars that funded pre-training are now flowing into post-training stacks — and the post-training stack is inseparable from environment infrastructure. This is the most capital-rich, urgency-pressed customer set in software.
Problem
Computer-use agents work in the demo. They break in production. The gap is environment quality.
Browser and computer-use AI is the most-watched capability in the model lab roadmap — and the most fragile in deployment. When OSWorld launched in 2024, the best model completed 12.24% of routine desktop tasks against a human baseline of 72.36%.[23] WebArena showed the same gap on multi-app web workflows.[1] a16z's framing is identical: "current agent offerings more closely resemble advanced RPA tools than true autonomous systems."[13] The gap has since closed at exactly the rate labs bought environments — which is the point.
The reason isn't model capacity. The reason is that labs train agents on whatever data they can scrape — public web recordings, OSS web benchmarks, synthetic prompts — and then fine-tune them on a few thousand hand-curated trajectories. None of that mirrors the production stack the agent will actually run against. The Salesforce instance, the Slack workspace, the ServiceNow ticket queue, the QuickBooks reconciliation flow — these are the surfaces the customer cares about, and they look nothing like the public web.
Live-web training makes it worse. Real websites can't be reset. They rate-limit aggressively, ban automation, and punish exploration. They change layout overnight. They cost real money when an agent buys the wrong flight. Every lab knows this; every lab has tried to build a stand-in internally; every lab has discovered that environment authoring is a product problem masquerading as a research problem.
$344B
AI capex (2025)
Hyperscaler spend now spilling into training, eval, and env infra
$14.3B
Meta → Scale AI
Single deal — labs paying premium prices for training data
$15B
Applied Intuition
AV simulation infra — proves the env-infra business model
LA Times AI capex 2025[5] · NYT Meta-Scale deal[7] · Reuters Applied Intuition valuation[11]
Why Now
The post-training era is here. Environment quality is the next constraint.
Three trends collided in the same eighteen months: agents shipped to production, pre-training gains flatlined, and every lab woke up needing the same thing — verifiable environments to train and grade computer-use models on.
Current agents still exhibit significant limitations in capability — struggling with complex or unfamiliar interfaces — and efficiency, operating too slowly and expensively to compete effectively with human operators.
a16z[13]
Computer-Use & Agentic Coworkers
The RL environment platforms are becoming foundational infrastructure for anyone looking to train generalist AI workers.
Felicis[14]
Rocket Fuel for AI
The teams that win won't look like traditional tooling vendors; they'll look like thought partners embedded with frontier labs, compounding trust and research depth over time.
Wing VC[33]
Who Will Win the RL Environment Market
Three preconditions converged in the same eighteen months.
Computer-use agents are now first-party products. OpenAI shipped ChatGPT Agent. Anthropic shipped Claude Computer Use. Google launched Project Mariner. Browserbase shipped Director.[3] [4] [13] [15] The category went from research demo to flagship product in twelve months. Every one of those launches has the same next-step problem: making the agent reliable on the long tail of enterprise apps.
Post-training is where capability now comes from. The o-series, Claude 3.5 Computer Use, and ChatGPT Agent were all RL post-training stories. Agents can't learn knowledge work through real-world trial and error — they need custom environments that faithfully simulate reality and reward success.[12] The output of post-training is no better than the environment it was trained against. The benchmark record makes the causality legible: OSWorld went from 14.9% (Claude 3.5 Sonnet, October 2024) to 38.1% (OpenAI's CUA, January 2025) to 61.4% (Sonnet 4.5, September 2025) to 84% (Opus 4.8, May 2026) — each jump an RL-against-environments release, none of them a bigger base model.[24] [25] [26] [28]
The money is following the bottleneck. Hyperscaler AI capex hit $344B in 2025.[5] [6] Meta paid $14.3B for Scale.[7] Surge held funding talks at a $25B+ valuation.[9] [20] Mercor quintupled to $10B in eight months on a $450M run rate.[19] And the spend has gone explicitly environmental: The Information reported Anthropic leadership discussing more than $1B on RL environments over the next year, with typical lab contracts running six to seven figures per quarter.[16] [17] Felicis calls RL environment platforms "foundational infrastructure for anyone looking to train generalist AI workers."[14]
Computer-use agents crossed the human baseline in two years
Chart
OSWorld task success rate by flagship model release. The 2024 paper's best model scored 12.24% against a 72.36% human baseline; Claude Opus 4.8 reached 84% on OSWorld-Verified in May 2026 (Anthropic updated its evaluation harness for 2026 scores). Every step on this curve was an RL post-training release — trained against purpose-built environments.[23] [24] [25] [26] [27] [28]
Source · OSWorld benchmark · Anthropic & OpenAI model announcements (2024–2026)
How It Works
Two products. One loop. Environments and the data they produce.
The loop is the product.
Design task → simulate → instrument → verify → train → review. Halluminate runs the same loop the best in-house lab teams run, packaged as infrastructure. Customers pick a workflow, Westworld stages a sandbox for it, checkers score every episode, and Athena's reviewers triage the failures into reward models. The loop already shows up in customer numbers: one customer reported a ~20% improvement in date-picking performance after training against Halluminate's flight-booking simulator.[29]
Delivery is verification-gated. A lab runs a smaller model against the environment; a second model grades the runs and flags errors — did the agent invoke the right tool for the task; then a larger model stresses it again. Failures route back to Halluminate to debug before the contract is accepted and paid. Increasingly the training target is the decision layer — the reasoning that picks the right tool, not the execution of the tool itself — which makes the verifiers, not the simulator, the scarce artifact. An environment that survives acceptance is lab-grade by construction.
Reusable modules bend the curve. Authentication, billing, search, form fills, ticket queues, notification banners — the same primitives show up in every enterprise UI. Each new environment shares more of its scaffolding with the last. The catalog grows superlinearly to engineering hours, the same way Applied Intuition's scenario library did in AV.[11]
Interoperable with the agent stack labs already use. Westworld plugs into popular agent frameworks, browser automation infra (including Browserbase),[15] and the standard training pipelines labs run. No new framework to adopt — Halluminate becomes the env layer beneath whatever the customer already runs.
Environments Are the New Datasets
Static data taught models to predict. Environments teach agents to act.
The shift from "sweatshop data" to simulation-as-data is the through-line of the last twelve months of frontier research — SemiAnalysis calls the winners "data foundries."[31] Every lab is saying the same thing. The companies that build the environment layer become the data layer of the next generation.
Static datasets carried pre-training. Post-training needs environments.
Why static data ran out of room. Pre-training was about scraping the world. Post-training is about practicing in it. A static label set can teach a model what a "good" outcome looks like once. An interactive environment teaches it how to recover when something unexpected happens — and recovery is most of what agent reliability turns out to be.
The pattern the labs already ran. The earliest lab investments in environment infrastructure — Procgen, DeepMind's StarCraft sandbox, the OpenAI Gym lineage — were research bets that environments are the scarcest resource in RL.[8] The same pattern is repeating one level up the stack: instead of toy gym tasks, the scarce environments are real enterprise workflows.
Why environments and the data they produce are sold together. Every agent episode against a Halluminate environment produces a trajectory, a checker score, an annotator review, and a reward signal — exactly the training input the labs need for the next model release. Customers buy environments today and end up paying for the resulting trajectory corpus on every retrain.
The RL environment platforms are becoming foundational infrastructure for anyone looking to train generalist AI workers.
Market
The buyer set is small. The budget is enormous.
Frontier model labs. Three to five labs control the high-end of the post-training spend, and the line item is now public: Anthropic leadership has discussed over $1B on RL environments in a year, and typical environment contracts run six to seven figures per quarter.[16] [17] Each lab is staffed with a small team trying to produce computer-use environments fast enough to keep up with the agent roadmap. Halluminate is already in active pilot with one of them, targeted to convert this quarter. Sustained AI capex creates adjacent demand for training and eval infra to help labs realize their model investments.[5] [6]
Serious agent companies. Browser Use, Yutori, Manus, Browserbase Director, and the next wave of agent products all need the same thing the labs need.[13] [15] Their differentiator is reliability in the customer's actual stack — which means training and grading against environments that mirror that stack. Halluminate sells the same product on the same loop.
Enterprise. The medium-term buyer is the enterprise platform team deploying internal agents. Vertical functions — marketing, finance, sales, HR — all require company-specific tuning against company-specific surfaces.[13] The same Westworld + Athena loop powers internal agent evaluation before the agent ever touches production. Halluminate has since planted its flag on the highest-value vertical first: the company now leads with RL environments for financial services — Excel modeling, investment banking, private equity, and consulting workflows — where task value per episode is highest and domain expertise is the barrier.[30] The demand side has gone vertical too: Anthropic now ships finance-agent templates with Excel and Moody's integrations — financial institutions are roughly 40% of its top-50 customers — and Rogo's $160M Series D at $2B (April 2026, a 2.7× step-up in under four months) priced what agentic finance is worth.[34] [35] Whoever trains those models needs finance-grade worlds to train them in.
The training-data and environments layer keeps repricing upward
Chart
Reported valuations across the data/environments stack. Scale priced at ~$29B in Meta's June 2025 deal; Surge held talks at ~$25B; Applied Intuition — the environments business model proven in AV — sits at $15B; Mercor quintupled to $10B in October 2025; Fleet, the first env-native startup on the curve, reached $750M in June 2026 on a months-old revenue base.[7] [20] [11] [19] [18]
Source · NYT · Bloomberg · Reuters · TechCrunch · The Information (2025–2026)
Every frontier lab is trying to build the environment stack in-house. Every one of them is failing to keep up with their own agent roadmap. Halluminate is the only company shipping the catalog as a product.
Competitive landscape
Twenty entrants, three to five winners. The fight is over who is infrastructure and who is labor.
When we invested, Halluminate was nearly alone in selling environments as a product. By mid-2026 roughly twenty funded companies sell into the category, and Wing projects consolidation to three to five winners by 2030.[16] [33] The winner's test is simple: reusable infrastructure over labor, depth over breadth, embedded with the labs. Score the field against it.
Scale and Surge sell workforce. Fleet and Deeptune sell breadth. Mechanize sells depth on code. Wing's test for the survivors — infrastructure over labor, depth over breadth, embedded with the labs — is the test Halluminate was built to pass, pointed at the highest-value workflows in the economy.
Founder deep dive
A product-research operator and a startup data engineer building the part of the lab stack no lab has bandwidth for.
Founder & team
Risks & mitigations
What we're watching
References
- [1]WebArena — Realistic web environment for autonomous agents
- [2]Y Combinator — Halluminate company profile
- [3]OpenAI — Introducing ChatGPT agent (computer-use launch)
- [4]Anthropic — Computer Use documentation
- [5]LA Times — Big Tech AI spending to reach $344B in 2025
- [6]New York Times — AI spending and the real economy (2025)
- [7]New York Times — Meta invests $14.3B in Scale AI
- [8]OpenAI — Procgen Benchmark: gym environments for generalization in RL
- [9]Reuters — Surge AI explores $1B raise at $30B+ valuation
- [10]TechCrunch — Adept raises $350M for computer-use agents
- [11]Reuters — Applied Intuition valued at $15B (AV simulation infrastructure)
- [12]Mechanize — Sweatshop data is over (RL environments thesis)
- [13]a16z — The rise of computer use and agentic coworkers
- [14]Felicis — Rocket Fuel for AI: RL environments and the RLaaS market
- [15]BuiltIn SF — Browserbase Director and $40M Series B
- [16]TechCrunch — Silicon Valley bets big on 'environments' to train AI agents (Anthropic's $1B+ RL-env plans)
- [17]Epoch AI — An FAQ on reinforcement learning environments (contract sizes, replica costs)
- [18]The Information — RL gym startup Fleet reaches $750M valuation on surging lab demand
- [19]TechCrunch — Mercor quintuples valuation to $10B with $350M Series C
- [20]Bloomberg — Scale rival Surge AI in talks for funding at $25B value
- [21]Prime Intellect — Environments Hub: a community platform to scale RL to open AGI
- [22]Mechanize — The upcoming GPT-3 moment for RL (replication training)
- [23]OSWorld — Benchmarking multimodal agents in real computer environments (human baseline 72.36%)
- [24]Anthropic — Introducing computer use with Claude 3.5 Sonnet (14.9% on OSWorld)
- [25]OpenAI — Computer-Using Agent / Operator (38.1% on OSWorld)
- [26]Anthropic — Claude Sonnet 4.5 (61.4% on OSWorld)
- [27]Anthropic — Claude Opus 4.6 (72.7% on OSWorld)
- [28]Anthropic — Claude Opus 4.8 (84% on OSWorld-Verified, May 2026)
- [29]Hacker News — Launch HN: Halluminate (YC S25), simulating the internet to train computer use
- [30]Halluminate — RL environments for financial services (company site, 2026)
- [31]SemiAnalysis — RL environments and RL for science: data foundries and multi-agent architectures
- [32]Epoch AI — An FAQ on Reinforcement Learning Environments: contract sizes, replica pricing, exclusivity premiums (Jan 2026)
- [33]Wing Venture Capital — Who Will Win the RL Environment Market—and Why (Jan 2026)
- [34]Fortune — Anthropic deepens Wall Street push: finance agents, Microsoft 365 integration, Moody's partnership (May 2026)
- [35]PR Newswire — Rogo raises $160M Series D at $2B to scale the agentic platform for finance (Apr 2026)
- [36]Applied Compute — The Advantage You Own: $80M led by Kleiner Perkins at $1.3B (Apr 2026)
- [37]Mechanize (X) — $9.1M raised at a $500M post-money valuation (Apr 2026)
- [38]SiliconANGLE — Deeptune raises $43M to accelerate AI learning through virtual training gyms (Mar 2026)
- [39]Handshake — Handshake acquires Cleanlab: evaluations, AI safety, RL environments (Jan 2026)
- [40]Turing — RL environments for agent training and evaluation (product page, 2026)
- [41]TechCrunch — micro1, a Scale AI competitor, touts crossing $100M ARR (Dec 2025)
- [42]Halluminate — Careers: help us train financial superintelligence; roles supporting $MMs in 2026 revenue



