
Airweave
The open-source context retrieval layer for AI agents.

$6M
Seed · Jul 2025
FCVC-led · Lux Capital, YC, Orange Collective, Pioneer Fund, Shay Banon
50+
Source connectors
Slack, Notion, Drive, Linear, Salesforce, Postgres…
$419B
Agentic AI tools TAM by 2034
From $6.2B in 2024 · ~52% CAGR
Thesis
- 01
Retrieval is the new database for agents. Models are commoditizing; context is not. Every meaningful agent dies at "how do I give it our data?" — and the answer is some private mash-up of ingestion code, OAuth plumbing, vector store, and ACL spaghetti. OpenAI's MCP launch made it official: every remote MCP server is expected to behave like a search engine, with
searchandfetchas the contract.[6] Airweave is that contract, ready out of the box. - 02
The connector matrix is the moat. Slack, Notion, Drive, Linear, Salesforce, GitHub, HubSpot, Postgres, Stripe, Zendesk, Confluence, Jira, Asana — 50+ integrations live, with the long tail still coming.[3] The work that compounds isn't the vector search; it's OAuth handshakes, schema drift handling, incremental sync, and ACL replication. Hard to clone in a weekend, harder to maintain at scale, and exactly the shape that turned Plaid and Merge into durable companies.
- 03
OSS + agents-first wins distribution. MIT license, FastAPI + Vespa stack, Python and TypeScript SDKs, a CLI, and a hosted cloud.[3] Same playbook Vercel ran for deployment and Supabase ran for Postgres: ship a clean OSS surface, let coding agents recommend it, build the managed plane on top. The category default is whatever ChatGPT tells the next agent dev to
pip install. - 04
Permissions and freshness are the things teams underestimate. Anyone can ship vector search in a weekend. Multi-tenant ACLs, incremental re-indexing on schema changes, conflict resolution across overlapping sources — that's the part that breaks at customer #20. Airweave is designing the permission and freshness plane from day one, not retrofitting it after the first SOC 2 audit.
Problem
Every AI team rebuilds the same retrieval stack. None of it is novel; all of it has to work.
Connectors to a dozen SaaS apps. OAuth flows for each one. Schema detection that survives the next API change. Document chunking across DOCX, PDF, PPTX, XLSX. An embedding pipeline. A vector store. A search API. ACL enforcement so agent queries only return what the end user is allowed to see. Incremental sync so the index doesn't go stale by Friday. None of it is novel — but every agent team writes a half-finished version of it themselves, and the half-finished version is exactly what breaks the first time a customer's Slack has 200,000 messages.
The OpenAI MCP launch made the problem explicit. Every remote MCP server is now expected to implement search and fetch — to behave like a search engine over a customer's data.[6] That's the entire retrieval stack as a public contract: ingestion + indexing + ranking + access control. Every team adopting MCP either rebuilds it or rents it. Airweave is the rent option, open-sourced.
The cost isn't writing the code once. It's the long tail — keeping forty connectors green as their APIs drift, keeping the index fresh as customer data changes, keeping permissions correct as agents query across workspaces. That's not a hackathon project; it's a company. And every agent founder we talked to would rather not own it.
50+
Connectors live
Slack, Notion, Drive, Linear, Salesforce, GitHub, Postgres, Stripe…
6.4k
GitHub stars
805 forks · active PR cadence · MIT license
176
Show HN points
40 comments · design partners at Cursor & Claude Desktop
GitHub repo[3] · Show HN thread[4]
Airweave GitHub stars — Dec 2024 to Jun 2026
Chart
Cumulative stars from per-stargazer timestamps. The two inflections map to events, not drift: May 2025 (+1,770 — Show HN front page and the X25 Product Showcase win) and Oct–Nov 2025 (+2,263 — the post-seed shipping run). 6,437 stars as of Jun 12, 2026.[3] [4] [8] [16]
Source · GitHub API stargazer data, airweave-ai/airweave (retrieved Jun 12, 2026)
Why Now
MCP turned retrieval into a public protocol — and a category is forming around the open default.
Three trends are colliding in the same twelve months: MCP standardized the agent–data contract, the frontier labs are keeping their stacks closed, and every agent team is independently discovering the cost of building this themselves.
Your MCP remote server should resemble a search engine. Every server is expected to implement search and fetch.
OpenAI MCP docs[6]
Spec for remote MCP servers
Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?
Yoko Li[7]
Partner · a16z
Your agent gets a simple search endpoint. Your users get reliable, personalized experiences.
Lennert Jansen[2]
Co-founder · Airweave
Three preconditions converged in the same twelve months.
MCP turned retrieval into a protocol. OpenAI's remote MCP launch (Jun 4 2025) mandated search and fetch on every server — a public spec for what agents need from data sources.[6] Anthropic's MCP spec did the same on the client side.[10] What used to be a thousand bespoke integrations is now a single contract — and a single contract is exactly the surface a default platform can fill.
The connector matrix is being rebuilt everywhere, badly. Every agent startup we talked to has a half-finished integrations folder; every one of them would rather not own it. The work is repetitive, the maintenance is endless, and the differentiation is zero — the perfect shape for an OSS default to absorb. Composio is already a paying Airweave customer; Cursor and Claude Desktop are design partners.[3] The category is forming in real time.
Frontier labs are keeping the plumbing closed. ChatGPT Connectors, Claude's data integrations, Codex's VM — all internal-only.[6] As OpenAI and Anthropic evolve from model vendors to product companies, holding the retrieval middleware private preserves their UX edge. That creates the structural opening for an open, vendor-agnostic stack that every agent builder outside the walled gardens can adopt. The same dynamic that gave Vercel space against AWS and Supabase space against Firebase.
A year on, the protocol bet resolved. PulseMCP alone lists 17,974 MCP servers as of June 2026[20]; tracked counts across the major public registries grew roughly 38% in the four months to mid-April 2026 (~6,800 → ~9,400).[21] Every one of those servers is expected to behave like a search engine — and most of their authors don't want to build the retrieval stack behind it. Capital confirmed the read: Mem0 raised $24M for agent memory with AWS naming it the exclusive memory provider for its Agent SDK[17], Letta took $10M from Felicis[18], and Glean re-rated to $7.2B on the enterprise side.[19] The context layer is no longer a thesis; it's a funded category — and the open, source-of-record retrieval slot in it is the one Airweave occupies.
Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?
How It Works
Five stages. One API. The agent's data plane in production by week two.
The shape that wins is one API, two surfaces, one permission model.
One API. Every agent builder wants the same thing — a single search endpoint that returns the right rows from across a customer's tools. Airweave's contract is that endpoint, with the connector graph behind it and the ACL filter at query time.[1]
Two surfaces. REST for traditional apps; MCP server for agents. Same data plane, two interfaces — and the MCP interface is exactly the shape OpenAI's docs now mandate.[6] Builders adopting MCP this year inherit Airweave's retrieval for free.
One permission model. Every connector replicates the source app's ACL into Airweave's index. Queries filter by the end user's source-app permissions in real time. No accidental cross-tenant leakage, no agent returning rows the user can't see in Slack itself. The unsexy work that becomes the customer's procurement requirement at deal #20.
The Connector Matrix Is the Moat
The interesting part of Airweave isn't the vector search. It's the integrations folder.
Anyone can ship vector search. Nobody wants to maintain forty OAuth flows, schema-drift handlers, and ACL replicators. That's the work that compounds — and that's what Airweave is racing to be the default for.
Plaid for fintech. Merge for HRIS. Airweave for agent context.
The compounding work isn't novel — that's the point. OAuth handshakes for forty SaaS apps. Schema detection that survives the next API change. Incremental sync that doesn't lose deletes. Rate-limit handling that doesn't take down the customer's workspace. ACL replication so agent queries return what the end user is allowed to see. None of it is glamorous; all of it has to work.[3]
Switching costs compound with every connector. Once a customer is wired into your connector graph, swapping you out means re-doing OAuth for every source app, re-validating every permission model, and re-running an initial sync that might take hours. The cost compounds with every connector added, every customer onboarded, every workspace authenticated. That's the same shape that made Plaid and Merge durable — and the same shape Airweave is building one OSS pull request at a time.
OSS is the accelerator, not the threat. Each connector is a scoped contribution — exactly the unit of work the community is good at, and Airweave is already taking PRs against it. The open license recruits the long-tail integrations the team would never prioritize in a closed roadmap. The hosted plane runs everything reliably and absorbs the operational cost.
The vector DB is a feature. The connector graph is the product. Every agent in production will read from one — and Airweave is building the open one.
Market
Every agent in production needs context. Most of them will rent it.
The agentic AI tools market is forecast to grow from $6.2B in 2024 to $419B by 2034 at ~52% CAGR.[5] Retrieval is the unglamorous half of that — every agent in production needs to read customer data, and every team would rather buy than build the connector matrix. The bet is that the unified context layer for agents becomes infrastructure on the same scale as the database was for web apps.
Inside YC alone, Composio is already a paying Airweave customer; design partners include teams at Cursor and Claude Desktop.[3] [4] Beachhead math: ~50k AI dev teams × $200–$2,000/mo on the cloud tier ⇒ $120M–$1.2B ARR in the near-term ICP before Airweave touches a single F500.
The demand side is no longer hypothetical. Mem0 — the adjacent memory layer, same buyer — reported API calls growing from 35M in Q1 2025 to 186M in Q3 2025, a 5.3x jump in two quarters.[17] That's the consumption curve of agent-context infrastructure being wired into production, and retrieval over systems of record sits one layer below memory in the same stack.
Every YC AI company is a retrieval problem in waiting. Airweave should be the answer by default — and that's how the next generation of agent infrastructure gets written.
Competitive landscape
Five categories of competition. Airweave is positioned against all of them.
Each category has a structural limitation — sales motion, source model, or stack depth. Airweave's OSS + connector-graph + agents-first stance is the answer to all five.
Disclosed funding across the agent context stack
Chart
One announced round per company, not cumulative. Seed-stage capital (Airweave $6M, Letta $10M, Mem0 $24M) is pricing the open infrastructure slots while Glean's $150M Series F at $7.2B marks the enterprise endgame for the same connector + permission surface.[16] [17] [18] [19]
Source · Company announcements & TechCrunch / PRNewswire coverage, Sep 2024 – Oct 2025
The closed stacks defend the host product. The vector DBs sell a primitive. Glean sells procurement. The open, vendor-agnostic context plane for agents is wide open — and Airweave is the company writing it.
Founder deep dive
Two founders, one obsession with the unglamorous half of the data stack.
Founder & team
Risks & mitigations
What we're watching
References
- [1]Airweave — Product homepage
- [2]Y Combinator — Airweave company profile
- [3]GitHub — airweave-ai/airweave (MIT, 6.4k stars)
- [4]Show HN: Airweave (176 points / 40 comments)
- [5]Market.us — Agentic AI Tools Market ($6.2B → $419B, 52% CAGR)
- [6]OpenAI — Remote MCP Server guide (search + fetch spec)
- [7]a16z — Yoko Li, A Deep Dive Into MCP and the Future of AI Tooling
- [8]YC X25 Product Showcase — Airweave winner, Sam Altman meeting
- [9]YC Launches — Airweave: Let agents search any app
- [10]Anthropic — Model Context Protocol introduction
- [11]Glean — Enterprise AI search (work AI platform)
- [12]Pinecone — Vector database for AI applications
- [13]Vectara — Generative AI search platform
- [14]Carbon — Universal connectors for LLM apps
- [15]Mendable.ai — AI search for technical content
- [16]Airweave — $6M seed round led by FCVC (Lux Capital, YC, Orange Collective, Pioneer Fund, Shay Banon), Jul 2, 2025
- [17]TechCrunch — Mem0 raises $24M from YC, Peak XV and Basis Set to build the memory layer for AI apps (Oct 28, 2025)
- [18]PRNewswire — Letta (MemGPT) raises $10M seed led by Felicis (Sep 2024)
- [19]TechCrunch — Glean lands $150M Series F at a $7.2B valuation (Jun 10, 2025)
- [20]PulseMCP — MCP server directory (17,974 servers listed as of Jun 12, 2026)
- [21]Digital Applied — MCP Ecosystem H1 2026 Retrospective (~6,800 tracked servers YE 2025 → ~9,400 mid-Apr 2026)



