RAG Systems and the Authorisation Problem — Why Vector-DB Access is Under-Regulated

The regulatory framework for AI data access has not kept pace with the proliferation of retrieval-augmented generation systems, and the industry's standardised response—role-based access control layered on top of database permissions—is architecturally inadequate to prevent both authorised overreach and compromised-credential abuse at scale.

The momentum behind retrieval-augmented generation (RAG) is undeniable. Large language models deployed in enterprise contexts now routinely ingest document repositories, knowledge bases, and transactional records via vector databases—Pinecone, Weaviate, Qdrant, Milvus—to ground inference in organisational data. The use case is compelling: better accuracy, lower hallucination, faster time-to-answer for knowledge workers. What is less often discussed is that RAG architectures have introduced a new attack surface that sits orthogonal to existing security controls, and regulatory bodies have not yet formulated coherent rules for its governance.

The bind is straightforward. When a user queries a RAG system, the application performs semantic search across embeddings to retrieve context, then feeds that context plus the user's query to the LLM. If the user is authorised to read the source document in the original database, they ought to be authorised to retrieve it via the RAG pipeline. If they are not authorised, the system must block retrieval. But most commercial RAG implementations achieve this by applying database-level access controls (SQL RBAC, row-level security, column-level masking) before vectorisation, or by tagging embeddings with user/role metadata and filtering results after semantic search. Both patterns fail under realistic threat scenarios. The first creates a synchronisation problem. The second creates what we might call an access-control-orthogonal vulnerability: an attacker who compromises the RAG application layer or the vector database itself can bypass the access-control logic entirely, because the semantic distance between a prompt and an embedding makes no distinction between authorised and unauthorised retrievals.

This is not a theoretical problem. In early 2024, Snowflake disclosed a credential-stuffing campaign that allowed attackers to access customer data warehouses via compromised credentials harvested from previous breaches. The initial attack vector was conventional—password reuse across internet services—but the scale and dwell time revealed a second-order failure: once inside a Snowflake instance, attackers could enumerate and exfiltrate data at a pace that detection systems could not match. The incident exposed 165 customers and over 400 GB of sensitive records before forensic investigation began. What made the breach particularly damaging was the asymmetry between the velocity of an attacker's data access and the velocity of the organisation's detection. In the presence of a RAG system connected to that same warehouse—a configuration that is now routine in financial services and healthcare—the attacker would have gained not just SQL read access, but semantic query capability: the ability to ask the LLM-driven system arbitrary questions about the data, which the system would dutifully answer by retrieving embeddings from the vector store. The embedding retrieval would not trigger the same audit logs as a direct SQL query; the semantic layer adds a translation step that obscures the origin of the request.

The Industry's Standard Response and Its Architectural Failure

The security community's default posture on RAG access control is to extend existing RBAC patterns. Organisations are advised to tag embeddings with user/group identifiers, then apply runtime access-control filtering at the RAG application layer before returning results to the user. This is the approach recommended by frameworks like NIST AI Risk Management (AI RMF 1.0), and it has the superficial appeal of compatibility with existing security policies and compliance regimes. DORA (Digital Operational Resilience Act), which enters full enforcement in early 2025, explicitly requires financial institutions to apply "sound access controls" to data used in algorithmic decision-making; RAG deployments supporting algorithmic lending or risk assessment will fall under this requirement. NIS2, the EU's updated directive on network and information security, obliges "advanced cybersecurity tools and practices" including "access and authentication controls" for critical infrastructure; member states will apply this to healthcare, energy, and finance verticals that increasingly rely on RAG for operational intelligence.

Yet the standard implementation is flawed at its foundation. Consider the architecture: a user submits a query to a RAG application. The application includes the user's identity in the request context. The application vectorises the query and issues a semantic search against the vector database. The vector database returns the top k most similar embeddings, each tagged with metadata including the owning user or role. The RAG application then filters the results, removing any embeddings tagged with a user or role the requester is not authorised to access. Finally, the application retrieves the full text of the corresponding documents from the primary database (SQL, document store, etc.) and returns them.

Three failure modes are immediate.

First: If the vector database or the RAG application is compromised, the access-control filter can be bypassed entirely. The attacker can query the vector database directly, bypassing the application layer, and retrieve embeddings that have not been subject to access-control validation. The MOVEit zero-day (CVE-2023-34362), which gave attackers arbitrary file-read capability on compromised servers, illustrates this pattern: once inside the perimeter, the attacker's access was orthogonal to the file-level permissions that the application intended to enforce. The attacker could simply enumerate and exfiltrate all files, regardless of intended access policy.

Second: Metadata tagging creates a new class of privilege-escalation vulnerability. If an attacker can mutate the metadata on an embedding—by writing directly to the vector database, or by poisoning the embedding generation pipeline—they can change the user/role tag and thus be retrieved by a different user's query. Vector databases like Pinecone and Weaviate support metadata mutation via their public APIs; access control to these APIs is typically enforced at the application level, not the database level. A compromise of the application layer, or a misconfigured API key, exposes this vector.

Third: Synchronisation between the primary data store and the vector database is weak. If a user's access to a document is revoked in the SQL database, the corresponding embeddings in the vector store may remain unchanged and retrievable until the embeddings are regenerated or manually deleted. In systems with high document velocity (continuous data ingestion, rapid permission changes), this window of inconsistency can be arbitrarily large. The Medibank incident (2022) exposed over 9.8 million customer records; the investigation revealed that access controls had been misconfigured across multiple data stores, and that synchronisation failures had allowed attackers to access data they should not have been able to reach. The forensic finding was that there was no single source of truth for permission state.

The regulatory response has been slow and fragmented. The SEC's guidance on "AI governance and risk management" (issued December 2024) does not specifically address RAG or vector-database access control; it gestures toward existing frameworks like NIST CSF and ISO 27001, both of which predate vector databases as a class of infrastructure. The FCA's Handbook (COBS Chapter 8, updated 2024) requires firms using algorithmic tools to "ensure that appropriate controls and governance procedures are in place," but does not mandate any specific architecture for data access within those tools. The result is regulatory ambiguity: organisations can satisfy formal compliance by implementing a credible-sounding RBAC scheme, even if that scheme is architecturally vulnerable to the threat models we have described.

The Structural Problem: Access Control Cannot Be Orthogonal to the Data Plane

The PULSE doctrine inverts the conventional thinking here. Rather than bolting access control onto a data retrieval system—whether that system is a SQL database, a document store, or a vector database—the access-control logic must be intrinsic to the data plane itself. The data must be encrypted, fragmented, or otherwise materialised in such a way that an attacker who gains read access to the underlying storage (whether the vector database, the embedding index, or the primary document repository) cannot extract meaningful information without possessing the appropriate cryptographic keys or credentials.

This is the zero-knowledge substrate principle: you cannot steal what is not there. In the context of RAG, "what is not there" is the unencrypted document text; "what is there" is only the encrypted ciphertext and the corresponding embeddings, both of which are non-invertible without the key.

Consider the current state of vector-database access control. Pinecone, Weaviate, Qdrant, and Milvus all offer some form of RBAC and metadata tagging. None of them offer encryption of embeddings at rest with per-document or per-user keys. None of them separate the control plane (permission metadata) from the data plane (embeddings and similarity computations) in a way that prevents a compromised data-plane component from leaking information to an attacker. Most vector databases store embeddings in unencrypted form; the metadata tagging is stored alongside or in an indexed structure that is also unencrypted.

An alternative architecture would encrypt each embedding or each document's embedding cluster with a key derived from the user or role that owns that document. The RAG application would then perform semantic search by:

Generating a query embedding from the user's input.
Decrypting the query embedding with the user's key.
Performing similarity search against encrypted embeddings.
Returning only embeddings that the query can decrypt.

This is cryptographically expensive and requires domain-specific acceleration, but it fundamentally changes the threat model. An attacker who gains read access to the vector database cannot retrieve embeddings without the key; they cannot forge a query embedding that will decrypt another user's embeddings, because the encryption is asymmetric to the user's credential set.

A second architectural move is to separate the embedding vector space from the retrieval logic. Rather than storing vectors alongside user/role tags and filtering post-hoc, the system could materialise separate embedding indices per user or per role, such that a query against the index cannot return results outside the user's authorisation boundary. This requires more storage and recomputation, but it eliminates the metadata-mutation attack and simplifies the synchronisation problem.

Adaptive Active Defence and Continuous Adversarial Drift

The PULSE doctrine also emphasises adaptive posture: the system does not merely defend a fixed perimeter; it continuously adjusts its operational configuration in response to observed or modelled threats. In the context of RAG access control, this means embedding anomaly detection directly into the data plane, not as a bolt-on SIEM or SOAR platform that processes logs after-the-fact.

For example, an organisation could implement a substrate where:

Every semantic search is metered: the system records the number of queries, the similarity scores of returned embeddings, and the amount of unique content retrieved per user per unit time.
The system models the expected distribution of these metrics under normal operation.
If a user's query pattern deviates from the model—e.g., a spike in the number of queries returning high-similarity results from protected documents, or a user retrieving documents from multiple roles within an improbably short time—the system degrades the user's access token, requiring re-authentication and possibly involving a human security analyst.
The degradation is not a binary allow/deny but a gradient: access is reduced, the embedding index is coarsened, similarity scores are noised, or the retrieval latency is increased.

This is not detection-and-response in the traditional sense. It is continuous adversarial adjustment. The attacker cannot assume a stable access environment; they must operate under the assumption that the system is observing and shifting its posture. Detection latency—the time between an attacker's action and the system's response—is measured in seconds or milliseconds, not hours or days.

Regulatory and Vendor Incentives

Regulators and vendors have misaligned incentives on this problem. Regulators want organisations to implement formally auditable access controls; auditors want to see RBAC matrices, access-request workflows, and permission-change logs. These are all control-plane signals. Vendors want to ship RAG platforms quickly; they engineer the data plane for performance (low latency, high throughput) and bolt on access control as a secondary layer.

Neither incentive structure rewards architectures that embed access control into the data plane or that prioritise post-breach resistance over pre-breach detection. The Optus breach (2022), which exposed 9.8 million customer records including names, dates of birth, phone numbers, and email addresses, occurred despite the presence of EDR, SIEM, and access-control systems. The vulnerability was a misconfigured cloud storage bucket; the access controls were in place, but they were not applied to the storage layer itself. The data plane was exposed. This is precisely the scenario that would repeat under a RAG architecture if access control is not intrinsic to the embedding store.

The path forward requires regulatory clarity. Authorities like the SEC, FCA, and EBA should issue specific guidance on RAG access control, mandating that:

Embeddings derived from sensitive documents must be encrypted at rest with keys independent of the vector-database access-control layer.
Vector databases must not be treated as trust boundaries; the encryption and key derivation must assume potential compromise of the database.
Access-control synchronisation between the primary data store and the vector database must be atomic or near-atomic; windows of inconsistency must be bounded and logged.
Anomaly detection and adaptive posture adjustment must be embedded in the query path, not implemented post-hoc via SIEM.

These requirements would shift vendor incentives toward architectures that are more robust and regulators toward guidance that is technically coherent.

Design Principles for RAG-Specific Access Control

An organisation deploying a RAG system today should evaluate its architecture against the following principles, derived from the PULSE doctrine:

Zero-knowledge embedding retrieval: The vector database should not be able to determine which user is performing a query or which documents are being retrieved without cryptographic keys held elsewhere. This can be implemented via encrypted indices, functional encryption, or private information retrieval (PIR) techniques adapted for similarity search.

Data-plane encryption with user-derived keys: Embeddings should be encrypted with keys derived from user credentials or role membership. The encryption should be asymmetric to the user's permission set; if a user's access is revoked, previously cached query embeddings should no longer decrypt future results.

Separation of metadata and vectors: User/role tags should not be stored alongside embeddings in a queryable structure. Metadata should be stored separately, with strict access controls, and reconciliation should be periodic and audited.

Metered query depth: The system should limit the number of documents a user can retrieve per unit time, the number of distinct semantic-search patterns they can issue, and the total size of unencrypted context returned to the LLM. These limits should be per-user, per-role, and per-application, and should be dynamically adjusted based on anomaly scores.

Atomic permission propagation: When a user's access to a document is revoked in the primary database, that revocation must be reflected immediately in the vector database. This requires either read-your-writes consistency guarantees across both stores or a synchronous update pattern that blocks the primary-store revocation until the vector database confirms the embedding deletion.

Invitation to Engagement

Organisations holding or transferring sensitive data that are deploying or planning RAG systems should request a technical briefing under executed Mutual NDA to explore sovereign digital infrastructure approaches that assume vector-database compromise from day one.

ai-security

Engagement

Request a briefing under executed Mutual NDA.

PULSE engages only with verified counterparties. Strategic briefing material — reference architecture, regulatory mapping, deployment topology — is released after counter-execution of the NDA scoped to the recipient's evaluation purpose.

Request Briefing →