
Airweave
The open-source context retrieval layer for AI agents.
Thesis
- 01
Retrieval is the new database for agents. Models are commoditizing; context is not. Every meaningful agent dies at "how do I give it our data?" — and the answer is some private mash-up of ingestion code, OAuth plumbing, vector store, and ACL spaghetti. OpenAI's MCP launch made it official: every remote MCP server is expected to behave like a search engine, with
searchandfetchas the contract.[6] Airweave is that contract, ready out of the box. - 02
The connector matrix is the moat. Slack, Notion, Drive, Linear, Salesforce, GitHub, HubSpot, Postgres, Stripe, Zendesk, Confluence, Jira, Asana — 50+ integrations live, with the long tail still coming.[3] The work that compounds isn't the vector search; it's OAuth handshakes, schema drift handling, incremental sync, and ACL replication. Hard to clone in a weekend, harder to maintain at scale, and exactly the shape that turned Plaid and Merge into durable companies.
- 03
OSS + agents-first wins distribution. MIT license, FastAPI + Vespa stack, Python and TypeScript SDKs, a CLI, and a hosted cloud.[3] Same playbook Vercel ran for deployment and Supabase ran for Postgres: ship a clean OSS surface, let coding agents recommend it, build the managed plane on top. The category default is whatever ChatGPT tells the next agent dev to
pip install. - 04
Permissions and freshness are the things teams underestimate. Anyone can ship vector search in a weekend. Multi-tenant ACLs, incremental re-indexing on schema changes, conflict resolution across overlapping sources — that's the part that breaks at customer #20. Airweave is designing the permission and freshness plane from day one, not retrofitting it after the first SOC 2 audit.
Problem
Every AI team rebuilds the same retrieval stack. None of it is novel; all of it has to work.
Connectors to a dozen SaaS apps. OAuth flows for each one. Schema detection that survives the next API change. Document chunking across DOCX, PDF, PPTX, XLSX. An embedding pipeline. A vector store. A search API. ACL enforcement so agent queries only return what the end user is allowed to see. Incremental sync so the index doesn't go stale by Friday. None of it is novel — but every agent team writes a half-finished version of it themselves, and the half-finished version is exactly what breaks the first time a customer's Slack has 200,000 messages.
The OpenAI MCP launch made the problem explicit. Every remote MCP server is now expected to implement search and fetch — to behave like a search engine over a customer's data.[6] That's the entire retrieval stack as a public contract: ingestion + indexing + ranking + access control. Every team adopting MCP either rebuilds it or rents it. Airweave is the rent option, open-sourced.
The cost isn't writing the code once. It's the long tail — keeping forty connectors green as their APIs drift, keeping the index fresh as customer data changes, keeping permissions correct as agents query across workspaces. That's not a hackathon project; it's a company. And every agent founder we talked to would rather not own it.
50+
Connectors live
Slack, Notion, Drive, Linear, Salesforce, GitHub, Postgres, Stripe…
6.4k
GitHub stars
800+ forks · active PR cadence · MIT license
176
Show HN points
40 comments · design partners at Cursor & Claude Desktop
Why Now
MCP turned retrieval into a public protocol — and a category is forming around the open default.
Three trends are colliding in the same twelve months: MCP standardized the agent–data contract, the frontier labs are keeping their stacks closed, and every agent team is independently discovering the cost of building this themselves.
Your MCP remote server should resemble a search engine. Every server is expected to implement search and fetch.
OpenAI MCP docs[6]
Spec for remote MCP servers
Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?
Yoko Li[7]
Partner · a16z
Your agent gets a simple search endpoint. Your users get reliable, personalized experiences.
Lennert Jansen[2]
Co-founder · Airweave
Three preconditions converged in the same twelve months.
MCP turned retrieval into a protocol. OpenAI's remote MCP launch (Jun 4 2025) mandated search and fetch on every server — a public spec for what agents need from data sources.[6] Anthropic's MCP spec did the same on the client side.[10] What used to be a thousand bespoke integrations is now a single contract — and a single contract is exactly the surface a default platform can fill.
The connector matrix is being rebuilt everywhere, badly. Every agent startup we talked to has a half-finished integrations folder; every one of them would rather not own it. The work is repetitive, the maintenance is endless, and the differentiation is zero — the perfect shape for an OSS default to absorb. Composio is already a paying Airweave customer; Cursor and Claude Desktop are design partners.[3] The category is forming in real time.
Frontier labs are keeping the plumbing closed. ChatGPT Connectors, Claude's data integrations, Codex's VM — all internal-only.[6] As OpenAI and Anthropic evolve from model vendors to product companies, holding the retrieval middleware private preserves their UX edge. That creates the structural opening for an open, vendor-agnostic stack that every agent builder outside the walled gardens can adopt. The same dynamic that gave Vercel space against AWS and Supabase space against Firebase.
Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?
How It Works
Five stages. One API. The agent's data plane in production by week two.
The shape that wins is one API, two surfaces, one permission model.
One API. Every agent builder wants the same thing — a single search endpoint that returns the right rows from across a customer's tools. Airweave's contract is that endpoint, with the connector graph behind it and the ACL filter at query time.[1]
Two surfaces. REST for traditional apps; MCP server for agents. Same data plane, two interfaces — and the MCP interface is exactly the shape OpenAI's docs now mandate.[6] Builders adopting MCP this year inherit Airweave's retrieval for free.
One permission model. Every connector replicates the source app's ACL into Airweave's index. Queries filter by the end user's source-app permissions in real time. No accidental cross-tenant leakage, no agent returning rows the user can't see in Slack itself. The unsexy work that becomes the customer's procurement requirement at deal #20.
The Connector Matrix Is the Moat
The interesting part of Airweave isn't the vector search. It's the integrations folder.
Anyone can ship vector search. Nobody wants to maintain forty OAuth flows, schema-drift handlers, and ACL replicators. That's the work that compounds — and that's what Airweave is racing to be the default for.
Plaid for fintech. Merge for HRIS. Airweave for agent context.
The compounding work isn't novel — that's the point. OAuth handshakes for forty SaaS apps. Schema detection that survives the next API change. Incremental sync that doesn't lose deletes. Rate-limit handling that doesn't take down the customer's workspace. ACL replication so agent queries return what the end user is allowed to see. None of it is glamorous; all of it has to work.[3]
Switching costs compound with every connector. Once a customer is wired into your connector graph, swapping you out means re-doing OAuth for every source app, re-validating every permission model, and re-running an initial sync that might take hours. The cost compounds with every connector added, every customer onboarded, every workspace authenticated. That's the same shape that made Plaid and Merge durable — and the same shape Airweave is building one OSS pull request at a time.
OSS is the accelerator, not the threat. Each connector is a scoped contribution — exactly the unit of work the community is good at, and Airweave is already taking PRs against it. The open license recruits the long-tail integrations the team would never prioritize in a closed roadmap. The hosted plane runs everything reliably and absorbs the operational cost.
The vector DB is a feature. The connector graph is the product. Every agent in production will read from one — and Airweave is building the open one.
Market
Every agent in production needs context. Most of them will rent it.
The agentic AI tools market is forecast to grow from $6.2B in 2024 to $419B by 2034 at ~52% CAGR.[5] Retrieval is the unglamorous half of that — every agent in production needs to read customer data, and every team would rather buy than build the connector matrix. The bet is that the unified context layer for agents becomes infrastructure on the same scale as the database was for web apps.
Inside YC alone, Composio is already a paying Airweave customer; design partners include teams at Cursor and Claude Desktop.[3] [4] Beachhead math: ~50k AI dev teams × $200–$2,000/mo on the cloud tier ⇒ $120M–$1.2B ARR in the near-term ICP before Airweave touches a single F500.
Every YC AI company is a retrieval problem in waiting. Airweave should be the answer by default — and that's how the next generation of agent infrastructure gets written.
Competitive landscape
Four categories of competition. Airweave is positioned against all of them.
Each category has a structural limitation — sales motion, source model, or stack depth. Airweave's OSS + connector-graph + agents-first stance is the answer to all four.
The closed stacks defend the host product. The vector DBs sell a primitive. Glean sells procurement. The open, vendor-agnostic context plane for agents is wide open — and Airweave is the company writing it.
Founder deep dive
Two founders, one obsession with the unglamorous half of the data stack.
Founder & team
Risks & mitigations
What we're watching
References
- [1]Airweave — Product homepage
- [2]Y Combinator — Airweave company profile
- [3]GitHub — airweave-ai/airweave (MIT, 6.4k stars)
- [4]Show HN: Airweave (176 points / 40 comments)
- [5]Market.us — Agentic AI Tools Market ($6.2B → $419B, 52% CAGR)
- [6]OpenAI — Remote MCP Server guide (search + fetch spec)
- [7]a16z — Yoko Li, A Deep Dive Into MCP and the Future of AI Tooling
- [8]YC X25 Product Showcase — Airweave winner, Sam Altman meeting
- [9]YC Launches — Airweave: Let agents search any app
- [10]Anthropic — Model Context Protocol introduction
- [11]Glean — Enterprise AI search (work AI platform)
- [12]Pinecone — Vector database for AI applications
- [13]Vectara — Generative AI search platform
- [14]Carbon — Universal connectors for LLM apps
- [15]Mendable.ai — AI search for technical content


