Airweave

Airweave

The open-source context retrieval layer for AI agents.

Airweave Launch Video[9]

Seed

Round

YC P25 · MIT-licensed

50+

Source connectors

Slack, Notion, Drive, Linear, Salesforce, Postgres…

$419B

Agentic AI tools TAM by 2034

From $6.2B in 2024 · ~52% CAGR

In production / design partners

Composio

Agent tooling · paying customer

Cursor

AI IDE · design partner

Claude Desktop

Agent client · design partner

Show HN

176 pts · 40 comments

Sam Altman

X25 Product Showcase winner

OSS adoption climbing — 6.4k GitHub stars, 800+ forks, active PR cadence.[3] [4]

Thesis

Every meaningful AI agent gets stuck at the same question: how do I give it our data? Slack, Notion, Drive, Linear, Salesforce, GitHub, Postgres — and every team rebuilds the same connector + sync + chunk + embed + search stack from scratch. Airweave is the open-source retrieval layer for agents: 50+ connectors, continuous sync, hybrid search, and a permission model built for multi-tenant agent retrieval, behind one API.[1] [3] The longer-term bet: Airweave becomes the system of record for what agents are allowed to see — the canonical context plane every production agent reads from.
  1. 01

    Retrieval is the new database for agents. Models are commoditizing; context is not. Every meaningful agent dies at "how do I give it our data?" — and the answer is some private mash-up of ingestion code, OAuth plumbing, vector store, and ACL spaghetti. OpenAI's MCP launch made it official: every remote MCP server is expected to behave like a search engine, with search and fetch as the contract.[6] Airweave is that contract, ready out of the box.

  2. 02

    The connector matrix is the moat. Slack, Notion, Drive, Linear, Salesforce, GitHub, HubSpot, Postgres, Stripe, Zendesk, Confluence, Jira, Asana — 50+ integrations live, with the long tail still coming.[3] The work that compounds isn't the vector search; it's OAuth handshakes, schema drift handling, incremental sync, and ACL replication. Hard to clone in a weekend, harder to maintain at scale, and exactly the shape that turned Plaid and Merge into durable companies.

  3. 03

    OSS + agents-first wins distribution. MIT license, FastAPI + Vespa stack, Python and TypeScript SDKs, a CLI, and a hosted cloud.[3] Same playbook Vercel ran for deployment and Supabase ran for Postgres: ship a clean OSS surface, let coding agents recommend it, build the managed plane on top. The category default is whatever ChatGPT tells the next agent dev to pip install.

  4. 04

    Permissions and freshness are the things teams underestimate. Anyone can ship vector search in a weekend. Multi-tenant ACLs, incremental re-indexing on schema changes, conflict resolution across overlapping sources — that's the part that breaks at customer #20. Airweave is designing the permission and freshness plane from day one, not retrofitting it after the first SOC 2 audit.

Problem

Every AI team rebuilds the same retrieval stack. None of it is novel; all of it has to work.

Connectors to a dozen SaaS apps. OAuth flows for each one. Schema detection that survives the next API change. Document chunking across DOCX, PDF, PPTX, XLSX. An embedding pipeline. A vector store. A search API. ACL enforcement so agent queries only return what the end user is allowed to see. Incremental sync so the index doesn't go stale by Friday. None of it is novel — but every agent team writes a half-finished version of it themselves, and the half-finished version is exactly what breaks the first time a customer's Slack has 200,000 messages.

The OpenAI MCP launch made the problem explicit. Every remote MCP server is now expected to implement search and fetch — to behave like a search engine over a customer's data.[6] That's the entire retrieval stack as a public contract: ingestion + indexing + ranking + access control. Every team adopting MCP either rebuilds it or rents it. Airweave is the rent option, open-sourced.

The cost isn't writing the code once. It's the long tail — keeping forty connectors green as their APIs drift, keeping the index fresh as customer data changes, keeping permissions correct as agents query across workspaces. That's not a hackathon project; it's a company. And every agent founder we talked to would rather not own it.

50+

Connectors live

Slack, Notion, Drive, Linear, Salesforce, GitHub, Postgres, Stripe…

6.4k

GitHub stars

800+ forks · active PR cadence · MIT license

176

Show HN points

40 comments · design partners at Cursor & Claude Desktop

GitHub repo[3] · Show HN thread[4]

Why Now

MCP turned retrieval into a public protocol — and a category is forming around the open default.

Three trends are colliding in the same twelve months: MCP standardized the agent–data contract, the frontier labs are keeping their stacks closed, and every agent team is independently discovering the cost of building this themselves.

Your MCP remote server should resemble a search engine. Every server is expected to implement search and fetch.

OpenAI MCP docs

OpenAI MCP docs[6]

Spec for remote MCP servers

Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?

Yoko Li

Yoko Li[7]

Partner · a16z

Your agent gets a simple search endpoint. Your users get reliable, personalized experiences.

Lennert Jansen

Lennert Jansen[2]

Co-founder · Airweave

Three preconditions converged in the same twelve months.

MCP turned retrieval into a protocol. OpenAI's remote MCP launch (Jun 4 2025) mandated search and fetch on every server — a public spec for what agents need from data sources.[6] Anthropic's MCP spec did the same on the client side.[10] What used to be a thousand bespoke integrations is now a single contract — and a single contract is exactly the surface a default platform can fill.

The connector matrix is being rebuilt everywhere, badly. Every agent startup we talked to has a half-finished integrations folder; every one of them would rather not own it. The work is repetitive, the maintenance is endless, and the differentiation is zero — the perfect shape for an OSS default to absorb. Composio is already a paying Airweave customer; Cursor and Claude Desktop are design partners.[3] The category is forming in real time.

Frontier labs are keeping the plumbing closed. ChatGPT Connectors, Claude's data integrations, Codex's VM — all internal-only.[6] As OpenAI and Anthropic evolve from model vendors to product companies, holding the retrieval middleware private preserves their UX edge. That creates the structural opening for an open, vendor-agnostic stack that every agent builder outside the walled gardens can adopt. The same dynamic that gave Vercel space against AWS and Supabase space against Firebase.

Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?
Yoko Li, a16z[7]

How It Works

Five stages. One API. The agent's data plane in production by week two.

Step 01

Connect — 50+ source apps

OAuth or API-key auth across Slack, Notion, Drive, Linear, Salesforce, GitHub, HubSpot, Stripe, Postgres, Zendesk, Confluence, Jira, Asana, Intercom, and dozens more. Multi-tenant by design: each end user authenticates their own workspace, Airweave isolates the data.

Step 02

Sync — continuous + incremental

Change detection on every connector. Schema drift is detected and handled; rate limits are absorbed at the connector layer; back-pressure is managed by the orchestrator (Temporal) so customer ingest doesn't degrade the search path.

Step 03

Chunk + Embed — document-aware

Chunking strategies tuned per file type — DOCX, PDF, PPTX, XLSX, HTML, TXT — plus structured records from Postgres or Salesforce. Vendor-neutral embedding; bring your own model, default to a sensible one.

Step 04

Index — hybrid retrieval

Vespa for vector + lexical + structured filters. Postgres for metadata. Redis for pub/sub. Pluggable to Pinecone, Weaviate, or pgvector if the customer has an existing stack.

Step 05

Serve — REST + MCP

One search endpoint exposed as REST for apps and as an MCP server for agents — the same surface OpenAI is teaching millions of devs to build against. Permission-aware: results filtered by the end user's source-app access at query time.

Step 06

Operate — OSS + Cloud

MIT core, Python + TypeScript SDKs, CLI, hosted cloud with usage-based pricing. Self-host the connector graph, contract the managed plane — the procurement-friendly answer for both indie devs and security-sensitive enterprises.

The shape that wins is one API, two surfaces, one permission model.

One API. Every agent builder wants the same thing — a single search endpoint that returns the right rows from across a customer's tools. Airweave's contract is that endpoint, with the connector graph behind it and the ACL filter at query time.[1]

Two surfaces. REST for traditional apps; MCP server for agents. Same data plane, two interfaces — and the MCP interface is exactly the shape OpenAI's docs now mandate.[6] Builders adopting MCP this year inherit Airweave's retrieval for free.

One permission model. Every connector replicates the source app's ACL into Airweave's index. Queries filter by the end user's source-app permissions in real time. No accidental cross-tenant leakage, no agent returning rows the user can't see in Slack itself. The unsexy work that becomes the customer's procurement requirement at deal #20.

The Connector Matrix Is the Moat

The interesting part of Airweave isn't the vector search. It's the integrations folder.

Anyone can ship vector search. Nobody wants to maintain forty OAuth flows, schema-drift handlers, and ACL replicators. That's the work that compounds — and that's what Airweave is racing to be the default for.

Plaid for fintech. Merge for HRIS. Airweave for agent context.

The compounding work isn't novel — that's the point. OAuth handshakes for forty SaaS apps. Schema detection that survives the next API change. Incremental sync that doesn't lose deletes. Rate-limit handling that doesn't take down the customer's workspace. ACL replication so agent queries return what the end user is allowed to see. None of it is glamorous; all of it has to work.[3]

Switching costs compound with every connector. Once a customer is wired into your connector graph, swapping you out means re-doing OAuth for every source app, re-validating every permission model, and re-running an initial sync that might take hours. The cost compounds with every connector added, every customer onboarded, every workspace authenticated. That's the same shape that made Plaid and Merge durable — and the same shape Airweave is building one OSS pull request at a time.

OSS is the accelerator, not the threat. Each connector is a scoped contribution — exactly the unit of work the community is good at, and Airweave is already taking PRs against it. The open license recruits the long-tail integrations the team would never prioritize in a closed roadmap. The hosted plane runs everything reliably and absorbs the operational cost.

The vector DB is a feature. The connector graph is the product. Every agent in production will read from one — and Airweave is building the open one.
Orange Collective

Market

Every agent in production needs context. Most of them will rent it.

The agentic AI tools market is forecast to grow from $6.2B in 2024 to $419B by 2034 at ~52% CAGR.[5] Retrieval is the unglamorous half of that — every agent in production needs to read customer data, and every team would rather buy than build the connector matrix. The bet is that the unified context layer for agents becomes infrastructure on the same scale as the database was for web apps.

Inside YC alone, Composio is already a paying Airweave customer; design partners include teams at Cursor and Claude Desktop.[3] [4] Beachhead math: ~50k AI dev teams × $200–$2,000/mo on the cloud tier ⇒ $120M–$1.2B ARR in the near-term ICP before Airweave touches a single F500.

Near term — AI agent builders

YC current and recent cohorts plus the broader AI-native seed-to-Series-A pool. Dense network, technical buyers, OSS-friendly. Composio paying today; Cursor and Claude Desktop integrated as design partners. Every new MCP server shipped is a potential Airweave install.[3] [4]

Long term — every software business with agents

Agentic AI tools: $6.2B (2024) → $419B (2034) at ~52% CAGR.[5] Within that, the retrieval / context plane is the layer every agent reads from. As enterprises move from "we have an AI strategy" to "we have agents in production," the connector graph + permission plane becomes a procurement line item — and Airweave is building the open default.

Every YC AI company is a retrieval problem in waiting. Airweave should be the answer by default — and that's how the next generation of agent infrastructure gets written.
Orange Collective

Competitive landscape

Four categories of competition. Airweave is positioned against all of them.

Each category has a structural limitation — sales motion, source model, or stack depth. Airweave's OSS + connector-graph + agents-first stance is the answer to all four.

Glean

Closed enterprise search

The incumbent answer for enterprise search across SaaS apps. Sales-led, six-figure procurement cycles, oriented toward IT and knowledge management — not toward an agent builder integrating a search endpoint at week two.[11] Closed-source, no developer surface, no MCP. The wrong shape for the agent-first cohort, but a useful proof point on the eventual feature surface.

Vectara · Mendable · Carbon

Managed RAG platforms

Hosted retrieval APIs with pre-built ingestion. Strong on the vector side, thinner on the connector matrix and the multi-tenant ACL story. Closed-source.[13] [14] [15] Airweave's wedge is the OSS distribution, the breadth of source apps, and the permission plane — not the search algorithm itself.

Pinecone · Weaviate · pgvector

Vector databases

The other half of the stack. Pinecone et al. solve the storage and retrieval primitive; they don't solve sync, connectors, or permissions.[12] Airweave is composable with all of them — customers can plug their preferred vector store underneath. The vector DB is a feature; the connector graph is the product.

OpenAI · Anthropic

Closed frontier-lab stacks

ChatGPT Connectors and Claude's data integrations are the closed, vendor-locked version of this stack.[6] [10] Powerful inside the host product, unavailable to anyone building a vendor-agnostic agent. As the labs become product companies, holding the retrieval plumbing internal preserves their UX edge — and creates the gap Airweave fills outside their walled gardens.

The closed stacks defend the host product. The vector DBs sell a primitive. Glean sells procurement. The open, vendor-agnostic context plane for agents is wide open — and Airweave is the company writing it.
Orange Collective

Founder deep dive

Two founders, one obsession with the unglamorous half of the data stack.

Why Lennert built it. Lennert has been working on retrieval and language models since 2020 — through Amazon and IBM AI research, watching the same RAG-by-hand pattern get reinvented at every company. The MCP launch wasn't a surprise to him; he had been arguing for years that retrieval needed to standardize. He just decided to be the person who shipped the standard.

Why Rauf built it. Rauf spent his career inside the unglamorous half of the data stack — building ingestion, orchestration, and ETL at fast-growing startups and enterprises. He had felt every part of the connector-matrix problem firsthand: the OAuth flows, the schema drift, the rate-limit handling, the ACL bookkeeping. Airweave is the consolidated version of every system he had to rebuild from scratch.

Why this team is the right team. An LLM researcher who has lived inside retrieval since 2020 meets a data platform engineer who has built the unsexy parts of the pipeline at scale. The split is clean: Lennert owns the retrieval surface and the model-side defensibility; Rauf owns the connector matrix and the orchestration layer. They cover both halves of the problem — and they've been friends since university, seven years before they wrote the first line of Airweave code together.

Velocity is a feature. The connector count went from a dozen to 50+ inside the YC batch.[3] The hosted cloud shipped the same quarter as the MIT core. Show HN landed 176 points with 40 substantive technical comments and turned into design-partner conversations at Cursor and Claude Desktop the same week.[4] Airweave was selected as a winner of the YC X25 Product Showcase — voted on internally by partners and batchmates — and Lennert and Rauf were invited to spend time with Sam Altman as a result.[8]

The long arc. Airweave becomes the system of record for what agents are allowed to see. Every connector authenticated, every workspace synced, every permission replicated flows through one platform. The OSS core wins distribution; the hosted plane powers the renewal; the long-term moat is the operational memory of how thousands of AI agents actually read the world's enterprise data.

Founder & team

Lennert Jansen

Lennert Jansen

Co-founder & CEO

LLM and retrieval researcher since 2020. Previously at Amazon and IBM on language models and search. University friends with Rauf for seven years before starting Airweave together.

Rauf Akdemir

Rauf Akdemir

Co-founder & CTO

Data platform engineer at high-growth startups and enterprise. Built the ingestion and orchestration systems Airweave's connector matrix sits on top of. Computer science background; obsessed with the unglamorous half of the data stack.

Risks & mitigations

Risk

Frontier labs build their own retrieval layer and bundle it into ChatGPT, Claude, and Gemini — closing the surface before an open default forms.

Mitigation

The labs are doing exactly that, and it's the strongest validation of the category. Their stacks are closed and tied to their host products; Airweave is the open, vendor-agnostic version that every agent builder outside the walled gardens can adopt. The same dynamic that gave Vercel space against Netlify and Supabase space against Firebase is the dynamic here: developers and platform teams structurally prefer the option they can self-host, fork, and audit.

Risk

OSS monetization is brittle. The MIT core gets adopted; the hosted cloud and premium connectors are what have to convert.

Mitigation

Same playbook Vercel ran on Next.js and Supabase ran on Postgres: OSS distribution wins the install, the managed plane wins the renewal. Airweave's wedge to paid is multi-tenant orchestration, premium connectors, hosted vector storage, and the ACL plane — the work an engineering team won't run on their own laptop in production. Cloud self-serve already shipped; the conversion shape is forming.

Risk

Connector breadth is real work that scales with engineering headcount, not with the OSS flywheel. Fifty connectors today; the long tail is hundreds.

Mitigation

Two compounding effects. (1) Each connector is a small, scoped contribution — exactly the unit of work the OSS community is good at, and Airweave is already taking PRs against it. (2) Connector quality, not raw count, is the moat: incremental sync, schema drift handling, ACL replication. The right comparison is Merge or Nango, where the integrations engineering org becomes the durable business — and the open license accelerates it instead of slowing it down.

Risk

Enterprise procurement — SOC 2 Type 2, fine-grained RBAC, on-prem and VPC deploys, GDPR — takes years to mature and gates the largest deals.

Mitigation

Airweave's near-term ICP isn't the F500 procurement loop; it's the YC and Seed-to-Series-A AI cohort, where the CTO buys product in an afternoon. SOC 2 and RBAC roll out on the same timeline that won Stripe and Datadog their early-day reputations. The OSS license is itself a procurement-friendly answer for security-sensitive buyers: self-host the core, contract the cloud.

What we're watching

  • Connector count crossing 100 — and the connector quality bar (incremental sync, ACL replication) holding as breadth grows.
  • MCP server adoption — Airweave being the default retrieval backend behind a meaningful share of new MCP servers shipped in 2026.
  • Cloud conversion from the OSS install base — the hosted plane's gross-margin profile and the size of the median paying account.
  • Enterprise pilots crossing $50k ACV — the first time the connector matrix and ACL story carry a real procurement cycle.

References

  1. [1]Airweave — Product homepage
  2. [2]Y Combinator — Airweave company profile
  3. [3]GitHub — airweave-ai/airweave (MIT, 6.4k stars)
  4. [4]Show HN: Airweave (176 points / 40 comments)
  5. [5]Market.us — Agentic AI Tools Market ($6.2B → $419B, 52% CAGR)
  6. [6]OpenAI — Remote MCP Server guide (search + fetch spec)
  7. [7]a16z — Yoko Li, A Deep Dive Into MCP and the Future of AI Tooling
  8. [8]YC X25 Product Showcase — Airweave winner, Sam Altman meeting
  9. [9]YC Launches — Airweave: Let agents search any app
  10. [10]Anthropic — Model Context Protocol introduction
  11. [11]Glean — Enterprise AI search (work AI platform)
  12. [12]Pinecone — Vector database for AI applications
  13. [13]Vectara — Generative AI search platform
  14. [14]Carbon — Universal connectors for LLM apps
  15. [15]Mendable.ai — AI search for technical content