Threat-Modeling an LLM Feature for a Regulated Client: A Methodology
A regulated financial entity decides to ship an LLM-powered feature — a support assistant, a document summariser, an internal copilot. Sooner or later, someone (an auditor, a risk committee, a regulator under DORA or NIS2) asks the question every regulated deployment eventually faces: how did you secure it, and how do you know? “We used a reputable model provider” is not an answer. A threat model is.
This is a repeatable methodology for threat-modeling an LLM feature in a context where the answer has to satisfy a regulator, not just a security review. It is deliberately structured so the output is the evidence — the threat model you build is the document you hand to the risk committee.
The organising question: what can it do, and what feeds it?
Everything in LLM threat-modeling reduces to two questions, and the methodology is mostly about answering them rigorously:
- What can the model do? — its actuators. Every tool it can call, every system its output flows into, every action it can trigger. This bounds the consequence of any manipulation.
- What feeds the model? — its data path. Every source of content it processes: user input, retrieved documents, web content, tool results, prior conversation. This bounds the attack surface for manipulation.
The actuator question determines how bad a compromise can be. The data-path question determines how a compromise arrives. A feature with no actuators and a tiny data path is low-risk almost regardless of the model; a feature with powerful actuators fed by external content is high-risk almost regardless of how good the model is. Threat-modeling starts by answering both precisely, because the entire risk profile follows from them.
The methodology, step by step
Step 1 — Map the data path completely. Enumerate every source of content the model processes, and classify each as trusted or untrusted. User input: untrusted. Retrieved documents: untrusted (someone authored them). Web content: untrusted. Tool results: untrusted unless the tool is fully constrained. Prior conversation turns: untrusted, and accumulating. The output of this step is a diagram: every arrow into the model, labelled with its trust level. Most LLM risk enters through an arrow someone forgot to draw.
Step 2 — Map the actuator path completely. Enumerate everything the model’s output can reach or trigger. Tools it can call. Systems that consume its output. Actions that fire based on what it produces. For each, ask: what is the worst a manipulated model could make this do? The output is a second diagram: every arrow out of the model, labelled with consequence severity. An actuator with high consequence severity is where your controls must concentrate.
Step 3 — Place the generation/actuation boundary. This is the central architectural decision. Between the model (manipulable) and every high-consequence actuator, there must be a boundary where something other than the model validates the action — a constrained policy check, a human authorisation, or both, scaled to consequence. The model proposing an action is safe; the model’s output triggering a consequential action without a gate is the core failure mode. The output of this step is the boundary, explicitly placed, for every actuator above a severity threshold.
Step 4 — Apply least agency. For every capability and permission the model has, ask whether the function strictly requires it. Strip everything that does not earn its place. This shrinks the actuator path directly — the smaller the blast radius, the less the rest of the model matters. The output is a justified capability list: each permission present, with a reason.
Step 5 — Address the per-request surface (OWASP layer). Walk the standard LLM risks against this specific feature: injection via the untrusted data path, insecure output handling into consumers, sensitive disclosure from reachable context, denial of service via expensive requests. Each is now answerable against the concrete data-path and actuator diagrams from steps 1–2. The output is a per-risk disposition: addressed how, residual where.
Step 6 — Add the trajectory layer (the part most skip). The per-request layer (step 5) does not cover multi-turn assembly — harm built across many individually-benign turns. For a regulated deployment, decide whether conversation-level monitoring is warranted given the actuators and data path. A high-actuator feature needs cumulative-context awareness; a no-actuator summariser largely does not. The output is an explicit decision on trajectory-level controls, with the reasoning — which is itself evidence of a mature threat model.
Step 7 — Map to the regulation. Connect each control to the regulatory demand it satisfies. The supply-chain provenance maps to DORA third-party risk. The data-path isolation maps to data-protection obligations. The actuator gating and least-agency map to operational-resilience and accountability expectations. The output is the regulator-facing layer: not a separate document, but a column on the threat model linking control to obligation.
Why this structure produces evidence, not just security
The methodology is deliberately ordered so its output is the deliverable a regulated client needs. At the end you have:
- Two diagrams (data path, actuator path) that show you understand the attack surface.
- An explicit generation/actuation boundary that demonstrates the core architectural control.
- A justified capability list that evidences least agency.
- A per-risk disposition mapped to OWASP that covers the recognised surface.
- An explicit trajectory-layer decision that shows maturity beyond the checklist.
- A control-to-regulation mapping that answers the auditor directly.
That set of artifacts is exactly what the risk-committee question demands. The security work and the compliance evidence are the same work, done in the right order. You do not threat-model and then write the compliance document; the threat model is the compliance document if you structure it this way.
The honest residuals
A credible threat model names what it does not close:
- Determined manipulation is assumed possible. The methodology limits consequence rather than promising the model is unmanipulable. For high-actuator features, the residual is “the model can be manipulated, but the gate prevents consequential action without authorisation” — which is a defensible posture, not a claim of perfection.
- The trajectory layer is incomplete. Cross-session assembly remains an open problem (covered in the multi-turn analysis). The honest threat model states this as a known residual, not a closed item.
- Supply-chain trust is partly inherited. Using a third-party model means inheriting risk you monitor but do not fully control — which DORA explicitly recognises as a third-party concern to manage, not eliminate.
Naming these is not weakness; it is what separates a real threat model from a reassurance document, and regulators increasingly know the difference.
The takeaway
Threat-modeling an LLM feature for a regulated client is not mysterious once you anchor it to the two questions — what can it do and what feeds it — and work outward from there to the generation/actuation boundary, least agency, the per-request layer, the trajectory layer, and the regulatory mapping. Structure the work in that order and its output is simultaneously the security architecture and the evidence the regulator wants.
The single sentence to carry: the model is a manipulable component, so threat-modeling it means bounding what feeds it and gating what it can do — and writing that down in a way an auditor can read is the same act as securing it.
An independent piece by johlem.net — IT security consulting, Luxembourg. LLM-feature threat modeling for regulated financial entities. Related tooling at cyberramen.com.