Airweave — Investment Memo · Orange Collective

Thesis

Every meaningful AI agent gets stuck at the same question: how do I give it our data? Slack, Notion, Drive, Linear, Salesforce, GitHub, Postgres — and every team rebuilds the same connector + sync + chunk + embed + search stack from scratch. Airweave is the open-source retrieval layer for agents: 50+ connectors, continuous sync, hybrid search, and a permission model built for multi-tenant agent retrieval, behind one API.^[1] ^[3] A year after we wrote the first check, the market has priced the layer: Airweave closed a $6M seed led by FCVC with Lux Capital, YC, and Shay Banon participating (Jul 2025)^[16], Mem0 raised $24M for the adjacent memory layer (Oct 2025)^[17], and Glean re-rated to $7.2B (Jun 2025)^[19] — three different entries into the same thesis. The longer-term bet is unchanged: Airweave becomes the system of record for what agents are allowed to see — the canonical context plane every production agent reads from.

01
Retrieval is the new database for agents. Models are commoditizing; context is not. Every meaningful agent dies at "how do I give it our data?" — and the answer is some private mash-up of ingestion code, OAuth plumbing, vector store, and ACL spaghetti. OpenAI's MCP launch made it official: every remote MCP server is expected to behave like a search engine, with search and fetch as the contract.^[6] Airweave is that contract, ready out of the box.
02
The connector matrix is the moat. Slack, Notion, Drive, Linear, Salesforce, GitHub, HubSpot, Postgres, Stripe, Zendesk, Confluence, Jira, Asana — 50+ integrations live, with the long tail still coming.^[3] The work that compounds isn't the vector search; it's OAuth handshakes, schema drift handling, incremental sync, and ACL replication. Hard to clone in a weekend, harder to maintain at scale, and exactly the shape that turned Plaid and Merge into durable companies.
03
OSS + agents-first wins distribution. MIT license, FastAPI + Vespa stack, Python and TypeScript SDKs, a CLI, and a hosted cloud.^[3] Same playbook Vercel ran for deployment and Supabase ran for Postgres: ship a clean OSS surface, let coding agents recommend it, build the managed plane on top. The category default is whatever ChatGPT tells the next agent dev to pip install.
04
Permissions and freshness are the things teams underestimate. Anyone can ship vector search in a weekend. Multi-tenant ACLs, incremental re-indexing on schema changes, conflict resolution across overlapping sources — that's the part that breaks at customer #20. Airweave is designing the permission and freshness plane from day one, not retrofitting it after the first SOC 2 audit.

Problem

Every AI team rebuilds the same retrieval stack. None of it is novel; all of it has to work.

Connectors to a dozen SaaS apps. OAuth flows for each one. Schema detection that survives the next API change. Document chunking across DOCX, PDF, PPTX, XLSX. An embedding pipeline. A vector store. A search API. ACL enforcement so agent queries only return what the end user is allowed to see. Incremental sync so the index doesn't go stale by Friday. None of it is novel — but every agent team writes a half-finished version of it themselves, and the half-finished version is exactly what breaks the first time a customer's Slack has 200,000 messages.

The OpenAI MCP launch made the problem explicit. Every remote MCP server is now expected to implement search and fetch — to behave like a search engine over a customer's data.^[6] That's the entire retrieval stack as a public contract: ingestion + indexing + ranking + access control. Every team adopting MCP either rebuilds it or rents it. Airweave is the rent option, open-sourced.

The cost isn't writing the code once. It's the long tail — keeping forty connectors green as their APIs drift, keeping the index fresh as customer data changes, keeping permissions correct as agents query across workspaces. That's not a hackathon project; it's a company. And every agent founder we talked to would rather not own it.

50+

Connectors live

Slack, Notion, Drive, Linear, Salesforce, GitHub, Postgres, Stripe…

6.4k

GitHub stars

805 forks · active PR cadence · MIT license

176

Show HN points

40 comments · design partners at Cursor & Claude Desktop

GitHub repo^[3] · Show HN thread^[4]

Why Now

MCP turned retrieval into a public protocol — and a category is forming around the open default.

Three trends are colliding in the same twelve months: MCP standardized the agent–data contract, the frontier labs are keeping their stacks closed, and every agent team is independently discovering the cost of building this themselves.

Your MCP remote server should resemble a search engine. Every server is expected to implement search and fetch.

OpenAI MCP docs^[6]

Spec for remote MCP servers

Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?

Yoko Li^[7]

Partner · a16z

Your agent gets a simple search endpoint. Your users get reliable, personalized experiences.

Lennert Jansen^[2]

Co-founder · Airweave

Three preconditions converged in the same twelve months.

MCP turned retrieval into a protocol. OpenAI's remote MCP launch (Jun 4 2025) mandated search and fetch on every server — a public spec for what agents need from data sources.^[6] Anthropic's MCP spec did the same on the client side.^[10] What used to be a thousand bespoke integrations is now a single contract — and a single contract is exactly the surface a default platform can fill.

The connector matrix is being rebuilt everywhere, badly. Every agent startup we talked to has a half-finished integrations folder; every one of them would rather not own it. The work is repetitive, the maintenance is endless, and the differentiation is zero — the perfect shape for an OSS default to absorb. Composio is already a paying Airweave customer; Cursor and Claude Desktop are design partners.^[3] The category is forming in real time.

Frontier labs are keeping the plumbing closed. ChatGPT Connectors, Claude's data integrations, Codex's VM — all internal-only.^[6] As OpenAI and Anthropic evolve from model vendors to product companies, holding the retrieval middleware private preserves their UX edge. That creates the structural opening for an open, vendor-agnostic stack that every agent builder outside the walled gardens can adopt. The same dynamic that gave Vercel space against AWS and Supabase space against Firebase.

A year on, the protocol bet resolved. PulseMCP alone lists 17,974 MCP servers as of June 2026^[20]; tracked counts across the major public registries grew roughly 38% in the four months to mid-April 2026 (~6,800 → ~9,400).^[21] Every one of those servers is expected to behave like a search engine — and most of their authors don't want to build the retrieval stack behind it. Capital confirmed the read: Mem0 raised $24M for agent memory with AWS naming it the exclusive memory provider for its Agent SDK^[17], Letta took $10M from Felicis^[18], and Glean re-rated to $7.2B on the enterprise side.^[19] The context layer is no longer a thesis; it's a funded category — and the open, source-of-record retrieval slot in it is the one Airweave occupies.

Does everyone need to implement their own RAG for tools, or is there a layer waiting to be standardized?

— Yoko Li, a16z^[7]

How It Works

Five stages. One API. The agent's data plane in production by week two.

The shape that wins is one API, two surfaces, one permission model.

One API. Every agent builder wants the same thing — a single search endpoint that returns the right rows from across a customer's tools. Airweave's contract is that endpoint, with the connector graph behind it and the ACL filter at query time.^[1]

Two surfaces. REST for traditional apps; MCP server for agents. Same data plane, two interfaces — and the MCP interface is exactly the shape OpenAI's docs now mandate.^[6] Builders adopting MCP this year inherit Airweave's retrieval for free.

One permission model. Every connector replicates the source app's ACL into Airweave's index. Queries filter by the end user's source-app permissions in real time. No accidental cross-tenant leakage, no agent returning rows the user can't see in Slack itself. The unsexy work that becomes the customer's procurement requirement at deal #20.

The Connector Matrix Is the Moat

The interesting part of Airweave isn't the vector search. It's the integrations folder.

Anyone can ship vector search. Nobody wants to maintain forty OAuth flows, schema-drift handlers, and ACL replicators. That's the work that compounds — and that's what Airweave is racing to be the default for.

Plaid for fintech. Merge for HRIS. Airweave for agent context.

The compounding work isn't novel — that's the point. OAuth handshakes for forty SaaS apps. Schema detection that survives the next API change. Incremental sync that doesn't lose deletes. Rate-limit handling that doesn't take down the customer's workspace. ACL replication so agent queries return what the end user is allowed to see. None of it is glamorous; all of it has to work.^[3]

Switching costs compound with every connector. Once a customer is wired into your connector graph, swapping you out means re-doing OAuth for every source app, re-validating every permission model, and re-running an initial sync that might take hours. The cost compounds with every connector added, every customer onboarded, every workspace authenticated. That's the same shape that made Plaid and Merge durable — and the same shape Airweave is building one OSS pull request at a time.

OSS is the accelerator, not the threat. Each connector is a scoped contribution — exactly the unit of work the community is good at, and Airweave is already taking PRs against it. The open license recruits the long-tail integrations the team would never prioritize in a closed roadmap. The hosted plane runs everything reliably and absorbs the operational cost.

Market

Every agent in production needs context. Most of them will rent it.

The agentic AI tools market is forecast to grow from $6.2B in 2024 to $419B by 2034 at ~52% CAGR.^[5] Retrieval is the unglamorous half of that — every agent in production needs to read customer data, and every team would rather buy than build the connector matrix. The bet is that the unified context layer for agents becomes infrastructure on the same scale as the database was for web apps.

Inside YC alone, Composio is already a paying Airweave customer; design partners include teams at Cursor and Claude Desktop.^[3] ^[4] Beachhead math: ~50k AI dev teams × $200–$2,000/mo on the cloud tier ⇒ $120M–$1.2B ARR in the near-term ICP before Airweave touches a single F500.

The demand side is no longer hypothetical. Mem0 — the adjacent memory layer, same buyer — reported API calls growing from 35M in Q1 2025 to 186M in Q3 2025, a 5.3x jump in two quarters.^[17] That's the consumption curve of agent-context infrastructure being wired into production, and retrieval over systems of record sits one layer below memory in the same stack.

Competitive landscape

Five categories of competition. Airweave is positioned against all of them.

Each category has a structural limitation — sales motion, source model, or stack depth. Airweave's OSS + connector-graph + agents-first stance is the answer to all five.

Glean

Closed enterprise search

The incumbent answer for enterprise search across SaaS apps — and the category's price anchor after its $150M Series F at a $7.2B valuation (Jun 2025, $100M+ ARR).^[11] ^[19] Sales-led, six-figure procurement cycles, oriented toward IT and knowledge management — not toward an agent builder integrating a search endpoint at week two. Closed-source, no developer surface, no MCP. The wrong shape for the agent-first cohort, but the clearest proof of what the connector + permission surface is worth at scale.

Mem0 · Zep · Letta

Agent memory layers

The best-funded adjacency: Mem0 raised $24M (Oct 2025) and was named AWS's exclusive memory provider for its Agent SDK^[17]; Letta took $10M from Felicis.^[18] But memory stores what the agent learned from interactions; Airweave retrieves what's true in the systems of record — Slack, Salesforce, Postgres — with the source app's permissions intact. Different write path, different read path, different moat (connector graph + ACL replication vs. recall quality). Complementary today; the watch item is convergence — whoever owns both planes owns the context budget.

The closed stacks defend the host product. The vector DBs sell a primitive. Glean sells procurement. The open, vendor-agnostic context plane for agents is wide open — and Airweave is the company writing it.

— Orange Collective

Founder deep dive

Two founders, one obsession with the unglamorous half of the data stack.

Why Lennert built it. Lennert has been working on retrieval and language models since 2020 — through Amazon and IBM AI research, watching the same RAG-by-hand pattern get reinvented at every company. The MCP launch wasn't a surprise to him; he had been arguing for years that retrieval needed to standardize. He just decided to be the person who shipped the standard.

Why Rauf built it. Rauf spent his career inside the unglamorous half of the data stack — building ingestion, orchestration, and ETL at fast-growing startups and enterprises. He had felt every part of the connector-matrix problem firsthand: the OAuth flows, the schema drift, the rate-limit handling, the ACL bookkeeping. Airweave is the consolidated version of every system he had to rebuild from scratch.

Why this team is the right team. An LLM researcher who has lived inside retrieval since 2020 meets a data platform engineer who has built the unsexy parts of the pipeline at scale. The split is clean: Lennert owns the retrieval surface and the model-side defensibility; Rauf owns the connector matrix and the orchestration layer. They cover both halves of the problem — and they've been friends since university, seven years before they wrote the first line of Airweave code together.

Velocity is a feature. The connector count went from a dozen to 50+ inside the YC batch.^[3] The hosted cloud shipped the same quarter as the MIT core. Show HN landed 176 points with 40 substantive technical comments and turned into design-partner conversations at Cursor and Claude Desktop the same week.^[4] Airweave was selected as a winner of the YC X25 Product Showcase — voted on internally by partners and batchmates — and Lennert and Rauf were invited to spend time with Sam Altman as a result.^[8]

The long arc. Airweave becomes the system of record for what agents are allowed to see. Every connector authenticated, every workspace synced, every permission replicated flows through one platform. The OSS core wins distribution; the hosted plane powers the renewal; the long-term moat is the operational memory of how thousands of AI agents actually read the world's enterprise data.

Founder & team

Lennert Jansen

Repeat Founder

Founder

building Airweave

Rauf Akdemir

Repeat Founder

Founder

Building Airweave, the dev tool that turns any app into agent knowledge.

Lennert Jansen

Co-founder & CEO

LLM and retrieval researcher since 2020. Previously at Amazon and IBM on language models and search. University friends with Rauf for seven years before starting Airweave together.

Rauf Akdemir

Co-founder & CTO

Data platform engineer at high-growth startups and enterprise. Built the ingestion and orchestration systems Airweave's connector matrix sits on top of. Computer science background; obsessed with the unglamorous half of the data stack.

Risks & mitigations

Risk

Frontier labs build their own retrieval layer and bundle it into ChatGPT, Claude, and Gemini — closing the surface before an open default forms.

Mitigation

The labs are doing exactly that, and it's the strongest validation of the category. Their stacks are closed and tied to their host products; Airweave is the open, vendor-agnostic version that every agent builder outside the walled gardens can adopt. The same dynamic that gave Vercel space against Netlify and Supabase space against Firebase is the dynamic here: developers and platform teams structurally prefer the option they can self-host, fork, and audit.

Risk

OSS monetization is brittle. The MIT core gets adopted; the hosted cloud and premium connectors are what have to convert.

Mitigation

Same playbook Vercel ran on Next.js and Supabase ran on Postgres: OSS distribution wins the install, the managed plane wins the renewal. Airweave's wedge to paid is multi-tenant orchestration, premium connectors, hosted vector storage, and the ACL plane — the work an engineering team won't run on their own laptop in production. Cloud self-serve already shipped; the conversion shape is forming.

Risk

Connector breadth is real work that scales with engineering headcount, not with the OSS flywheel. Fifty connectors today; the long tail is hundreds.

Mitigation

Two compounding effects. (1) Each connector is a small, scoped contribution — exactly the unit of work the OSS community is good at, and Airweave is already taking PRs against it. (2) Connector quality, not raw count, is the moat: incremental sync, schema drift handling, ACL replication. The right comparison is Merge or Nango, where the integrations engineering org becomes the durable business — and the open license accelerates it instead of slowing it down.

Risk

The agent memory layer converges into retrieval. Mem0 ($24M raised, AWS's exclusive memory provider for its Agent SDK) and Zep sit next to the same buyer with more capital — and could extend down into connectors.

Mitigation

Memory and retrieval are different engineering problems wearing the same category label. Memory layers persist what an agent learned from its own interactions; Airweave syncs and searches what's true in the customer's systems of record, with ACLs replicated from the source apps. Extending from recall quality into a 50-connector OAuth + sync + permissions matrix is years of unglamorous work — the exact work Airweave has already done. The convergence risk runs both directions, and the connector graph is the harder half to replicate.

Risk

Enterprise procurement — SOC 2 Type 2, fine-grained RBAC, on-prem and VPC deploys, GDPR — takes years to mature and gates the largest deals.

Mitigation

Airweave's near-term ICP isn't the F500 procurement loop; it's the YC and Seed-to-Series-A AI cohort, where the CTO buys product in an afternoon. SOC 2 and RBAC roll out on the same timeline that won Stripe and Datadog their early-day reputations. The OSS license is itself a procurement-friendly answer for security-sensitive buyers: self-host the core, contract the cloud.

What we're watching

Connector count crossing 100 — and the connector quality bar (incremental sync, ACL replication) holding as breadth grows.
MCP server adoption — Airweave becoming the default retrieval backend behind a meaningful share of the ~18k MCP servers now listed on PulseMCP, a base still compounding double-digit percent per quarter.
Cloud conversion from the OSS install base — the hosted plane's gross-margin profile and the size of the median paying account.
Enterprise pilots crossing $50k ACV — the first time the connector matrix and ACL story carry a real procurement cycle.

References