Multi-tenant RAG: how to isolate customer data and stop leaks
Most retrieval-augmented generation demos run on a single corpus. The moment you point one at customer data inside a multi-tenant SaaS product, the hard problem is not retrieval quality, it is making sure tenant A can never retrieve a chunk that belongs to tenant B. A single missing filter in a query, a background job that runs without tenant context, or a cache key that omits the tenant id is enough to leak one customer's documents into another customer's answer.
This is a guide to the isolation decisions that actually matter when you ship RAG over customer data: how to pick an index layout, where to enforce the tenant boundary, and the specific leak tests to run before launch.
Pick an isolation model before you write retrieval code
There are three common layouts, and the right one depends on tenant count and contract requirements.
Silo: one index or namespace per tenant. Strongest isolation, because a query physically cannot reach another tenant's vectors. The cost is operational, you manage many indexes and pay for headroom on each. This fits enterprise customers with strict data-residency or single-tenant contractual terms.
Pool: one shared index with a tenant id stored as metadata on every vector, filtered on each query. Cheapest and simplest to operate, but every query depends on a correct filter, so the blast radius of a bug is the whole customer base. This fits high-volume SMB products.
Bridge: a hybrid, pool for small tenants and a dedicated silo for large or regulated ones. Most SaaS products land here as they grow.
Write the choice down and make it explicit in code. A pool-now, silo-later migration is far cheaper to plan up front than to retrofit after a customer asks for single-tenant storage.
Enforce the tenant boundary at the data layer, not the prompt
The most common mistake is treating the model as a gatekeeper, for example instructing it to only answer from tenant 14's documents. Language models are not an access-control layer. They will follow a clever prompt injection, or simply retrieve the wrong chunk, and you will not get an audit trail. The tenant filter has to be deterministic and enforced before the model ever sees a chunk.
In practice that means the tenant id is derived server-side from the authenticated session or a signed JWT claim, never from a request body the client can edit. The retrieval call applies that id as a hard filter: a namespace in silo mode, a metadata equality filter in pool mode. The same id flows into your cache key, your reranker input, and any logging.
The places the boundary silently breaks
Cross-tenant leaks rarely come from one obvious hole. They come from the edges:
- A background re-embedding job that iterates all documents without re-applying the tenant scope.
- An evaluation or analytics query written by a different team that forgets the filter.
- A cache keyed on the question text but not the tenant id, so tenant B gets tenant A's cached answer.
- An admin or internal tool that reads across tenants for debugging and is left wired into production.
Each of these runs outside the request path where your main filter lives, which is exactly why they get missed in review.
Test for leaks like an attacker, before launch
Isolation is not something you confirm by reading code. Build a small adversarial test set with at least two synthetic tenants that hold deliberately distinct, identifiable facts (tenant A: Project Falcon ships in March; tenant B: Project Heron ships in May). Then:
- Ask tenant A's assistant about tenant B's fact and assert the model both retrieves nothing from B and says it does not know.
- Run the same probe through every entry point: chat, search, any API, and any async summary job.
- Replay the probe with a tampered tenant id in the request body to confirm the server ignores it and uses the session claim.
- Warm the cache as tenant A, then query as tenant B with the same question, and assert no shared answer.
Wire these into CI so a refactor that drops a filter fails the build instead of shipping. This is the same discipline we cover in our notes on common RAG evaluation mistakes, applied to security rather than answer quality.
Retrieval quality still matters per tenant
Isolation does not get you good answers, it only keeps them separate. Each tenant's corpus has different size, vocabulary, and freshness, so chunking and reranking choices that worked in your demo may underperform on a small tenant with sparse data. The retrieval fundamentals in our guide to production RAG architecture and the tradeoffs in chunking strategies for retrieval quality both still apply, and a two-stage reranking pass helps most when a tenant's corpus is large and noisy.
If you are also fighting wrong answers, treat grounding and isolation as separate problems; the techniques to reduce hallucinations in production RAG are orthogonal to the tenant boundary.
Frequently asked questions
Is a shared (pool) index safe for multi-tenant RAG?
Yes, if the tenant filter is deterministic, derived server-side, and applied on every query and background job, and you have automated leak tests. The risk is operational, not theoretical: one missed filter exposes everyone, so pool layouts demand stricter test coverage than silo layouts.
Should I encrypt each tenant's vectors separately?
Per-tenant encryption helps for data-at-rest and residency requirements, and it is often required by enterprise contracts. It does not replace query-time filtering: an encrypted vector retrieved into the wrong tenant's context is still a leak.
Can I rely on the model to respect tenant boundaries if I tell it to?
No. Prompt-level instructions are not access control. Enforce the boundary in retrieval and pass the model only chunks the tenant is allowed to see.
When should I move a tenant from pool to silo?
Usually when a contract requires single-tenant storage or data residency, when one tenant's volume creates noisy-neighbor latency, or when the customer is large enough that a shared-index bug would be an unacceptable risk. Plan the migration path while you are still on pool.
Shipping RAG over real customer data is mostly a discipline problem: choose an isolation model on purpose, enforce the boundary in code, and prove it with adversarial tests. If you want senior engineers to build or review a multi-tenant RAG feature, see what we build and how the engagement works.
Rather we just build it?
Book a free scoping call and we'll ship your production-safe AI feature this week.