Prompt Injection is the New SQL Injection — And We're Making the Same Mistakes

The architectural error we are repeating is structural, not technical — we are bolting guardrails onto systems designed to be permeable.

The parallel between SQL injection and prompt injection is surface-true and therefore dangerous. SQL injection succeeded because input validation was a perimeter problem, and the perimeter was everywhere. We learned, eventually, to separate data from instruction — parameterised queries, least privilege, schema-level access control. Prompt injection is succeeding now because we have designed large language models as instruction-absorption engines, wired them directly into production systems handling sensitive queries and transactions, and then applied the same remediation logic: filters, content policies, jailbreak detection, fine-tuning resistance metrics. This is not a solution. This is watching a bank build thicker doors after admitting the vault has no walls.

The risk is not theoretical. In late 2024, multiple financial services organisations experienced indirect prompt injection attacks through LLM-powered customer service systems — attackers embedded malicious instructions in transaction descriptions, support tickets, and knowledge base entries, causing the models to leak customer account details, override transaction controls, and in one documented case, generate false compliance reports. The exact scope remains under regulatory investigation. Separately, a major UK telecommunications provider discovered in Q1 2025 that adversaries had compromised internal documentation pipelines feeding an AI-driven network troubleshooting system; injected queries led to unauthorised configuration changes across customer accounts. Neither incident has been fully disclosed by the affected organisations. What is known is that standard AI safety measures — OWASP Top 10 for LLM Applications, content filtering, output validation, input sanitisation — failed to prevent either attack. They failed not because the controls were weak, but because they were applied to a fundamentally permeable architecture.

This article examines why the industry's standard response to prompt injection repeats the structural errors of the SQL injection era, and proposes instead an architectural reorientation: data-plane isolation, zero-knowledge token substrates, and domain-specific instruction primitives that make injection, by design, unintelligible.

The Narrative: Prompt Injection as the New Injection Vulnerability

The industry has settled on a reassuring story. In 2022 and 2023, researchers published proof-of-concept attacks on LLM systems — prompt injection, indirect prompts, prompt leakage — showing that adversaries could manipulate LLM outputs by crafting malicious inputs. The OWASP Foundation released the OWASP Top 10 for Large Language Model Applications (versions 1.0 in 2023, 1.1 in 2024), listing prompt injection as the primary risk. Security vendors responded predictably: OpenAI and Anthropic published safety guidelines and red-teaming recommendations; enterprises adopted prompt filtering solutions; threat intelligence firms like CrowdStrike and Mandiant published mitigation playbooks treating prompt injection as a detection and response problem.

The canonical reference is the Wondercraft.ai case of 2023, where researchers demonstrated that an LLM-based customer service system could be tricked into revealing its system prompt and subsequently manipulated into executing unintended functions. Similar proofs-of-concept followed: direct injection (embedding malicious instructions in user input), indirect injection (poisoning documents or knowledge bases that feed the model), and context fragmentation attacks (exploiting the model's inability to reliably distinguish system instructions from user data). By 2024, SANS ISC and several major cloud providers (AWS, Azure, Google Cloud) had published detection signatures and response procedures for prompt injection attempts, treating the problem as one of input validation, output monitoring, and incident response orchestration.

The mistake was subtle: the security community applied the mental model of SQL injection without examining the architectural difference. SQL injection works because SQL is a dual-purpose language — it is simultaneously data and instruction. The solution was to force a hard separation: prepared statements, type systems, least-privilege database roles. Prompt injection works because LLMs are, by design, instruction-absorption engines. They are trained to interpret natural language as goal-directed instruction. The "separation" between system prompt, user input, retrieved documents, and function calls is not structural — it is soft, statistical, and continuously negotiable by anyone who understands the model's training data well enough.

The industry's response has been to add friction to that negotiation: jailbreak detection, prompt hardening, constitutional AI, reinforcement learning from human feedback (RLHF) to encode instruction-following boundaries. These measures improve median-case safety. They do not change the fundamental architecture. A perimeter-hardened instruction-absorption engine is still an instruction-absorption engine. And as with SQL injection in 2005–2010, the gap between median-case and adversary-case grows with investment in filters.

Why Filtered Instruction Absorption Is Not Defence — Structural Failures in the Current Approach

The architectural failure becomes visible when one examines real-world deployments. Most enterprise LLM systems follow a pattern: a foundational model (OpenAI GPT-4, Anthropic Claude, open-source alternatives like Llama 2) is fine-tuned or prompted with domain-specific instructions, connected to a retrieval-augmented generation (RAG) pipeline that pulls external documents, and wired into transaction systems via function-calling interfaces (tool use, API execution, code interpretation). Each layer is treated as a control boundary — the model's safety training is assumed to be the first control, input filtering the second, output validation the third, function-call authorisation the fourth.

The Synnovis ransomware attack of June 2024 — which disrupted NHS pathology services across south-east England — did not directly involve LLMs, but exposed a principle relevant here: adversaries prioritise the last layer in a chain where human trust is highest. In the Synnovis case, that was Active Directory; in AI-driven systems, it is the output of the LLM itself, which enterprises treat as authoritative precisely because it is generative and appears to reason. That trust is unwarranted. An LLM does not reason; it performs statistical pattern completion conditioned on its training data and current context. If the context can be modified by input, or if the "function" the LLM is asked to call is actually a proxy for sensitive operations, the enterprise is not defending a reasoning engine — it is defending a tokenised decision-maker with a massive input surface.

Consider a concrete case: a financial services firm implements an LLM-powered trade reconciliation system. The LLM is given access to a knowledge base of historical trades, regulatory guidelines, and settlement procedures. Users submit trade data and reconciliation queries; the model generates a summary of discrepancies and recommends actions. The firm applies standard input validation (SQL injection filters, XSS prevention), fine-tunes the model on internal documentation, adds a layer of output validation (checking that recommended actions fall within an approved action set), and implements function-call authorisation (an action only executes if both the LLM output and a human approver consent).

An adversary observes that historical trade data is stored in a company knowledge base that feeds the RAG pipeline. The adversary introduces a subtle inconsistency into a stored settlement procedure document — a deliberately ambiguous phrase that, when interpreted by the LLM in a specific context, causes it to recommend an action that aligns with the adversary's interest (e.g., marking a trade as settled when it should not be, or routing a confirmation to an attacker-controlled address). The input filters miss this because there is no injection in the user input; the model's safety training misses it because the recommended action is technically consistent with the injected procedure; the output validation passes because the action is in the approved set; the human approver approves because the model's reasoning, read in isolation, is plausible.

This is prompt injection at the architectural level, not the tactical level. It is invisible to OWASP Top 10 controls because it does not rely on a trick — it relies on the fact that an LLM, given conflicting or ambiguous instructions in its context, will resolve the conflict in the direction of whichever instruction is most statistically similar to its training data.

The standard response — improve the model, strengthen the filters, add more approval layers — deepens the vulnerability by increasing the surface and the opacity. Every filter is a new parameter space an adversary can learn. Every approval layer is a human decision-maker who must consume and trust a generated artefact that is, by construction, designed to be persuasive.

The PULSE Reorientation: Substrate Design Over Detection

The solution is not to make the permeable system more selective about what it absorbs. The solution is to eliminate the permeable layer from the data path.

PULSE's doctrine rests on a principle that SQL injection eventually taught the database community: you cannot steal, manipulate, or exploit what is not accessible in the first place. For SQL injection, this meant separating data from instruction at the language level (parameterised queries), the access level (least privilege per role), and the execution level (no dynamic code generation). For prompt injection in transaction systems, it means a comparable architecture: data-plane isolation, zero-knowledge token substrates, and domain-specific instruction primitives that are unintelligible to general-purpose language models.

The first principle is data-plane separation. An LLM should never have direct access to sensitive transactional data, live accounts, or authorization contexts. Instead, the LLM operates in a purely generative plane — it produces structured summaries, recommendations, or analysis based on derived, ephemeral, time-limited views of data. Those views are not documents or context; they are cryptographically signed summaries that the LLM can reference but not modify or reinterpret. If the LLM attempts to reference a summary it should not have access to, the signature verification fails at the data plane, not at the model layer. The LLM never learns that the data existed.

The second principle is zero-knowledge token substrates. Instead of asking an LLM to decide what action to take (and then filtering that decision), provide the LLM with a set of action tokens — cryptographic commitments to specific, pre-authorised outcomes. The LLM's task is classification, not decision-making: given a query, emit the identifier of the appropriate token. The actual execution happens by dereferencing the token, which requires authentication and authorisation at the data plane. If the LLM attempts to emit a token ID that does not exist, or to reason its way into a different action, the transaction fails silently. There is no "advisory" output that a human must then act upon — there is only successful authorised action or non-action.

The third principle is domain-specific instruction primitives. Rather than using a general-purpose language model as the decision-maker, define a narrow instruction set for the specific domain (trade reconciliation, customer service, claims processing) and train or fine-tune the LLM only on that primitive set. The instruction set is not natural language — it is a formal grammar with a bounded token space, no self-reference, no context modification, and no variable-length input. An LLM operating under such constraints cannot be prompted into novel behaviours because novel behaviours are not expressible in the grammar. This is not a filter; it is an elimination of the degrees of freedom that make injection possible.

Architectural Blueprints: Concrete Design Patterns

A financial services firm building a customer service system under this doctrine would structure it as follows:

Data plane: Customer account data, transaction history, and regulatory constraints are held in a secure enclave (confidential compute, SGX, or equivalent) and never exposed to the LLM. The LLM receives only a zero-knowledge proof that a query is valid — a cryptographic token that asserts "this customer's account exists and is in good standing" without revealing account details.

Token substrate: The LLM is trained to classify customer intents (request balance, initiate payment, dispute transaction, request statement) into a fixed token set. Each token corresponds to a pre-authorised workflow. When the LLM emits a token, the data plane verifies that the customer has authorisation for that workflow and executes it. If the LLM attempts to emit a token that does not exist, or to construct a novel action, the transaction fails. The customer receives a response: "I can help with balance inquiries and payments, but not with that request."

Domain primitives: The LLM is fine-tuned on a formal grammar with five statement types: CLASSIFY_INTENT(user_input), RETRIEVE_SUMMARY(token_id), REQUEST_ACTION(token_id), ESCALATE(reason), and CLARIFY(ambiguity). Each primitive has a bounded token space and no self-reference. When the LLM generates a response, it outputs only those primitives. An LLM attempting to inject a new primitive, or to reason about data outside its token space, produces invalid output that the execution layer rejects.

This design does not filter prompt injections. It makes prompt injections incoherent. An adversary cannot inject malicious instructions into natural language because the LLM does not process arbitrary natural language — it processes only a formal grammar. An adversary cannot manipulate the data plane because the LLM has no visibility into it. An adversary cannot social-engineer a human approver because there is no human-readable advisory layer — there is only token emission and authorised action.

The governance implication is significant. Under current approaches, an organisation must apply a patchwork of controls (input filtering, RLHF safety training, output validation, human approval) and hope the combination is sufficient. Under a PULSE architecture, the organisation defines a formal threat model for the LLM layer in isolation — "the LLM can emit any token, at any time, in any sequence" — and the data plane and transaction layer are designed to be indifferent to that threat. The LLM becomes a stateless classifier, not a decision-maker. Its compromise does not cascade.

Regulatory Alignment and Implementation Friction

The Change Healthcare ransomware attack of February 2024 — which disrupted pharmacy networks and medical record access across the United States for weeks — highlighted a cascading failure pattern: a compromised vendor system gained trust-based access to downstream critical infrastructure, and that access was insufficiently grained. The healthcare industry's regulatory response (via HHS and state Attorneys General) has emphasised vendor segmentation and zero-trust architecture. The PULSE doctrine aligns with that response: an LLM system should be designed under the assumption that it is compromised, and that assumption should be baked into the architecture, not layered on top.

For financial services firms (subject to FCA SM&CR, PRA, and DORA requirements), the implications are direct. DORA's Article 18 on ICT third-party risk and Article 17 on incident reporting require organisations to maintain "control of its ICT systems" and to report incidents that cause "material unavailability of information systems." A prompt injection attack that causes an LLM to misclassify a transaction, leak customer data, or generate a false compliance report meets that threshold. A DORA-compliant architecture for LLM systems would be one in which the LLM layer is treated as an untrusted vendor — segregated, monitored, and unable to access data or execute transactions directly. The PULSE pattern achieves that by design.

For healthcare and life sciences (HIPAA, HITECH, UK DHSC Security Standards, MAS TRM), the principle is similar: PHI and PII must be architected into isolation, not defended by filters. For telecommunications and critical infrastructure (NIS2, NYDFS Part 500, APRA CPS 234), the risk appetite is now zero for systems where an adversary can manipulate inputs and influence sensitive operations. An LLM in the data path violates that constraint unless it is architected as a stateless classifier over a zero-knowledge substrate.

Implementation friction is real. Most enterprise LLM deployments today (October 2024) are integrated via API calls to commercial models, with retrieval-augmented generation pipelines pulling from unstructured knowledge bases. The PULSE architecture requires a shift: moving to self-hosted or purpose-built models, constructing formal grammars for domain primitives, building secure enclaves for sensitive data, and establishing cryptographic token systems for authorisation. This is not a configuration change; it is a systems redesign.

But the alternative is to accept that prompt injection will follow the SQL injection trajectory: ten to fifteen years of partial fixes, regulatory pressure, and breach cascades before the industry agrees that the architecture must change. We have the opportunity to compress that timeline.

Call to Rigorous Engagement

Organisations operating critical financial, healthcare, or infrastructure systems that are currently evaluating or have deployed LLM-based decision-making, transaction, or data-access systems are invited to request a structured technical briefing under Mutual NDA to examine threat models, architectural alternatives, and implementation roadmaps aligned with PULSE doctrine and domain-specific regulatory requirements.

ai-security

Engagement

Request a briefing under executed Mutual NDA.

PULSE engages only with verified counterparties. Strategic briefing material — reference architecture, regulatory mapping, deployment topology — is released after counter-execution of the NDA scoped to the recipient's evaluation purpose.

Request Briefing →