Do major vector databases support permission-aware retrieval natively?

Most support metadata filtering, which is the substrate for Models 2 and 3 above. Native ACL-aware retrieval (Model 4) is rarer. Check your vector DB's documentation for metadata-filter performance characteristics; some are slow under heavy filtering.

Is per-tenant index always the safest choice?

It is the easiest to reason about and audit. It is not always the safest in practice if your per-tenant indexes are not actually isolated (sharing infrastructure, sharing connection pools, sharing cache). Verify the isolation matches the model on paper.

Can I just rely on the LLM to refuse to disclose unauthorized content?

No. The LLM does not know your ACLs. Once content is in its context, it will use it. The defense has to be at the retrieval boundary, not at the model output.

What is the most common bug in filtered retrieval?

ACL filter applied at one layer but not at all retrieval paths. Teams add the filter to the main /search endpoint but forget the /suggest, /complete, or /related endpoints. Each retrieval path needs its own coverage. A pentest finds these every time.

Does permission-aware retrieval slow things down measurably?

Yes, typically 50-300ms per query depending on the model. Latency is usually not the bottleneck. The bigger cost is operational: testing, monitoring, and incident handling are now more complex. Plan for both.

RAG Security

Permission-Aware RAG Retrieval: Stopping Leaks Without Killing Recall

RAG systems retrieve content based on semantic similarity, not access policy. The result is the most common breach shape in AI features: documents one user should not see surfacing in another user's context because the embedding said they were relevant. The fix is permission-aware retrieval, and the tradeoffs are real.

By Austa · Published May 21, 2026 · ~9 min read

Why RAG leaks across permissions by default

A RAG system embeds documents into a vector space. A query gets embedded and compared. The closest matches come back, regardless of which user owns them. The retriever does not know about access control. The retrieved chunks then flow into the LLM's context, the LLM produces a response, and the requesting user sees content that was indexed from a document they were never authorized to read.

This is the default behavior of most RAG implementations. Every "we built an internal AI search over our company docs" deployment has this pattern unless the team explicitly addressed it. Every public-facing RAG product that ingests user-generated content has it unless they thought about the multi-tenant case from day one.

The leak rarely shows up in normal usage because most queries are about content the user is authorized to see. The leak shows up when an attacker queries with an embedding-friendly description of content they want to extract.

The three retrieval-leak shapes

1. Cross-tenant leakage

The system serves multiple customers (organizations) from a shared vector store. Tenant A's documents come back in Tenant B's queries because the embeddings are commingled. This is the breach shape regulators understand and the one that triggers GDPR or SOC 2 findings.

2. Cross-user leakage within a tenant

Single-tenant deployment, but different users have different document access (HR documents only visible to HR, executive-level docs only visible to executives). The retriever ignores per-user ACLs and returns whatever the embedding finds.

3. Cross-context leakage within a user

Same user, different contexts. The user uploads a confidential document to a private project. Later, a different conversation in a different project retrieves chunks from that document because the embedding is global to the user. The user is authorized to see their own content, but they did not intend it to flow into this other context.

All three shapes have the same root cause and similar fixes, but the impact ranking is roughly cross-tenant > cross-user > cross-context.

Four permission-aware retrieval models

Model 1: Per-tenant index

Each tenant gets its own vector index. Queries against tenant A's index can only return tenant A's documents. Clean isolation, easy to reason about, supports compliance audits. Costs grow with tenant count, and you lose any cross-tenant value (which you usually do not want anyway).

For multi-tenant SaaS, this is the default to start with. The simplicity-to-safety ratio is excellent.

Model 2: Filtered retrieval on a shared index

Single vector index for the whole system. Each document is tagged with metadata (tenant ID, ACL, classification). Retrieval is "find similar AND match this filter." Most vector databases support metadata filters efficiently.

This is what RAGGuard, SafeKey, and similar tools implement under the hood. It is more flexible than per-tenant indexes (a user with cross-tenant authorization can search across tenants if policy allows). The risk is filter correctness: if the filter logic has a bug, you have a cross-tenant leak in a system designed to prevent exactly that.

Model 3: Post-retrieval filtering

Retrieve broadly, then filter the results against the requesting user's ACL before returning. Conceptually simple, works with any vector store. The catch: you may need to retrieve far more candidates than you want to return (your top-10 result post-filter might require top-100 pre-filter to have enough authorized matches), which costs latency and bandwidth.

This model works well as a defense-in-depth layer added on top of Model 2: filter at retrieval time AND filter the results, so a bug in either layer is caught by the other.

Model 4: Re-ranking with access scores

Retrieve broadly, re-rank the results with a model that considers both relevance and the requesting user's authorization level. The re-ranker can demote (but not exclude) results the user has partial authority for, and exclude results they have no authority for.

More complex than the other models. Pays off in deployments where access control is graded (you can see this document at low confidence, but the system will not surface highly-confidential matches) rather than binary.

Tradeoffs you cannot escape

Permission-aware retrieval costs you on three axes:

Recall. Strict filtering means some genuinely relevant content gets excluded. Users with broad access get the same results they would without filtering; users with narrow access see less. This is correct behavior, but it surfaces as "the AI search does not find things for me" complaints.

Latency. Filters and re-rankers add work to every query. Per-tenant indexes are usually fastest; post-retrieval filtering and re-ranking add 50-300ms.

Operational complexity. The retrieval pipeline now has access-control logic, which means tests need ACL coverage, monitoring needs leak detection, and incident response needs to handle "did we leak across permissions" as a first-class category. Most teams underweight this until they have an incident.

Accepting the tradeoffs is the point. RAG that does not respect permissions is wrong; making it right has costs.

How to test for retrieval leaks

A focused pentest of a RAG system's permission boundaries:

Build a small adversarial corpus. Plant 20-30 documents across different tenants/users/contexts with distinctive content (UUIDs, unique phrases) that cannot occur by accident.
Run queries from each authorization scope with embedding-friendly descriptions of the planted content.
Check what the LLM returns. If User B's query produces an answer mentioning User A's UUID, you have a leak.
Vary query phrasing. Embeddings are sensitive to phrasing. The leak may not show up on the obvious query but may show up on a paraphrase.
Test edge conditions. Empty ACLs, deleted users, rotated tenant IDs, recently revoked access. These are where filter bugs live.

The minimum bar: if you have a RAG system serving more than one principal (more than one user, more than one tenant, more than one context), you have multi-permission concerns. Most teams add permission-awareness reactively, after a customer asks why their data showed up somewhere it should not. Be proactive.

The poisoning side-channel

Permission-aware retrieval addresses the leak direction (User B should not see User A's content). It does not address the reverse: an attacker with write access to one tenant planting content that, when retrieved by an admin user, executes a prompt injection. That is the same indirect-prompt-injection family covered in document parsers as injection vectors, applied to the RAG store.

A complete RAG threat model needs both directions: leak (data flowing across permissions in the read direction) and poisoning (instructions flowing across permissions in the influence direction). We treat the integrity side in depth in RAG poisoning and knowledge-corruption attacks, including how to red-team the write paths into your corpus.

Document parsers as prompt injection vectors covers the poisoning side-channel.
RAG poisoning and knowledge-corruption attacks covers the integrity side of the corpus boundary and how to pentest the write paths.
KV leak channels covers the parallel surface in agent state stores.
The 2026 LLM security checklist covers retrieval controls in the memory-and-RAG category.

Permission-Aware RAG Retrieval: Stopping Leaks Without Killing Recall

Why RAG leaks across permissions by default

The three retrieval-leak shapes

1. Cross-tenant leakage

2. Cross-user leakage within a tenant

3. Cross-context leakage within a user

Four permission-aware retrieval models

Model 1: Per-tenant index

Model 2: Filtered retrieval on a shared index

Model 3: Post-retrieval filtering

Model 4: Re-ranking with access scores

Tradeoffs you cannot escape

How to test for retrieval leaks

The poisoning side-channel

Related