blog.johlem.net

The OWASP LLM Top 10, for Someone Who Has to Defend a Real Deployment

The OWASP Top 10 for LLM Applications is the closest thing the field has to a shared vocabulary for LLM risk. It is also, in most writeups, presented as a glossary — ten terms, ten definitions, a paragraph each. That is useful for awareness and useless for defence. If you actually have an LLM feature in production and someone has made you responsible for it, you do not need definitions. You need to know which of these risks is architectural (designed-in or designed-out), which is operational (managed continuously), and where the list itself leaves you exposed.

This is the list read as a defender’s checklist, from the seat of someone who has to answer for the deployment.

The first reframe: architectural vs. operational

Before the items, the distinction that organises them. Each LLM risk is dominantly one of two kinds:

Most real failures come from treating an architectural risk as if vigilance could cover for a bad design, or treating an operational risk as if a one-time control closed it. Sorting the list this way is the first defensive act.

The items, as a defender sees them

Prompt injection. The signature LLM risk: hostile input manipulating the model’s behaviour. This is both — partly architectural (do not put a manipulable model upstream of an actuator; separate generation from action with a human-authorised gate), partly operational (monitor for injection patterns). The architectural half is the one that matters most: if your design lets attacker-controlled text reach a component that can act, no input filtering will fully save you. Design so the model’s output is reviewed before anything irreversible happens. Defender’s question: what can my model’s output trigger without a human in between?

Insecure output handling. Treating model output as trusted when it is, in fact, attacker-influenceable. Architectural. The model’s output is untrusted input to whatever consumes it — sanitise and validate it exactly as you would any external input. The classic failure is piping model output into a downstream system (a shell, a query, a renderer) that trusts it. Defender’s question: does anything downstream of the model trust its output implicitly?

Training-data poisoning. Corrupting the data the model learned from. Mostly architectural and supply-chain — for teams using models rather than training them, this collapses into provider/supply-chain trust. Defender’s question: do I know the provenance of the model and any data I fine-tune on?

Model denial of service. Resource exhaustion via expensive requests. Operational. Rate limits, cost controls, input bounds. Familiar territory — it is availability engineering applied to an expensive compute endpoint. Defender’s question: what is the cost ceiling on a single request, and who can hit it?

Supply-chain vulnerabilities. Compromised models, dependencies, plugins. Architectural and operational both — provenance at design time, monitoring over time. DORA’s third-party focus makes this one a compliance concern, not just a security one. Defender’s question: what is in my LLM supply chain and how would I know if a piece of it changed?

Sensitive information disclosure. The model revealing data it should not — training data, context, other users’ information. Both. Architectural (do not put data in context the user should not access; isolate per-user context), operational (monitor outputs). The architectural half dominates: most disclosure failures are data that should never have been reachable placed where the model could surface it. Defender’s question: is there anything in the model’s reachable context that this user is not entitled to?

Insecure plugin/tool design. Tools the model can invoke that do dangerous things with insufficient control. Architectural — and this is the high-severity one for agentic deployments. A tool the model can call is an actuator the model controls, and the model is manipulable. Every tool needs the generation/actuation separation: the model proposes, a constrained layer validates, ideally a human authorises consequential actions. Defender’s question: for each tool, what is the worst thing a manipulated model could make it do?

Excessive agency. Giving the model too much autonomy, permission, or functionality. Architectural. This is the meta-risk behind insecure tools — the more the model can do without a check, the larger the blast radius of a successful manipulation. The defence is the principle of least agency: the model gets the minimum capability the function requires, and consequential actions are gated. Defender’s question: what does my model have permission to do that it does not strictly need?

Overreliance. Humans trusting model output too much — acting on hallucinations, unreviewed generation. Operational and organisational. The control is process: human review where it matters, calibrated trust, not treating fluent output as correct output. Defender’s question: where in my workflow does a human act on model output without verifying it?

Model theft. Exfiltration of the model itself. Mostly relevant to those hosting proprietary models; for most deployments this is lower priority than the list ordering implies. Defender’s question: is the model itself an asset I am protecting, or am I consuming someone else’s?

Where the list under-covers

The Top 10 is a strong awareness tool with a real blind spot for defenders, and honesty requires naming it: the list is largely single-turn in its framing. Prompt injection, output handling, disclosure — these are mostly described as properties of a request and its handling. But the hardest LLM attacks are multi-turn — harm assembled across a conversation through decomposition, dilution, and context manipulation, where no single request trips any of the ten items.

A defender who checks every box on the Top 10 and stops has addressed the single-turn surface and left the multi-turn surface unexamined. The list tells you to worry about a malicious prompt; it under-emphasises a malicious trajectory spread across many individually-benign prompts. That is the gap to fill with conversation-level thinking — trajectory analysis, cumulative-context evaluation — that sits above the per-request items the list enumerates.

(This connects directly to the multi-turn attack analysis I’ve written separately — the OWASP list is the per-request layer; multi-turn is the layer above it that the list does not fully reach.)

The defender’s actual checklist

Stripped to what you would actually do with a deployment you own:

  1. Map each item to architectural or operational for your system — fix the architectural ones in design, sustain the operational ones in running.
  2. Find every actuator the model can reach (tools, output consumers, downstream systems) and put a generation/actuation separation in front of each — the model proposes, something constrained validates, a human gates consequence.
  3. Audit reachable context for data the current user is not entitled to — disclosure is mostly a data-placement failure.
  4. Apply least agency — strip every permission and capability the function does not strictly require.
  5. Add the layer the list misses — conversation-level monitoring for multi-turn assembly, above the per-request controls.
  6. Treat the LLM supply chain as a DORA third-party concern — provenance and monitoring, with the compliance evidence that implies.

The takeaway

The OWASP LLM Top 10 is a vocabulary, and vocabulary is necessary but not sufficient. Defending a real deployment means sorting the list into what you build correctly versus what you run continuously, putting a human-gated separation in front of every actuator the manipulable model can reach, and — critically — adding the multi-turn, conversation-level layer the list under-covers.

The single most useful reframe for a defender: the model is a manipulable component, so the whole security question is what it is allowed to do, and what stands between its output and anything irreversible. Answer that for every tool and every output path, add the trajectory layer on top, and you have moved from knowing the ten terms to actually defending the system.


An independent piece by johlem.net — IT security, Luxembourg. LLM-security threat modeling, also documented at cyberramen.com.