blog.johlem.net

The One LLM Security Pattern That Covers Most of the Others: Separate Generation From Actuation

There is a single architectural principle that, once internalised, covers a surprising fraction of LLM security. It is not a model-level fix, not a prompt-engineering trick, and not a filter. It is a structural decision about what the model is allowed to do directly: the component that generates must not be the component that acts, and between them sits a content-bound gate — a human authorisation, a constrained policy check, or both.

This sounds simple to the point of obvious. It is also the thing that most consequential LLM security failures violate, and getting it right neutralises an entire class of attacks that no amount of model-level hardening fully addresses.

The fundamental fact: the model is manipulable

Start from the one thing you must assume: the model can be manipulated. Through its user (jailbreaking), through hostile content in its data path (prompt injection), through multi-turn assembly — there are multiple routes, and you cannot assume any of them is fully closed. A determined adversary can influence what the model produces.

If you accept that premise — and you must — then the security of any system built on the model is determined not by whether the model can be manipulated (assume yes) but by what a manipulated model can cause to happen. And that is an architectural question, entirely within your control, regardless of how good or bad the model’s own defences are.

This is the pivot. You cannot guarantee the model behaves. You can guarantee the model’s output cannot directly trigger consequences. The first is a model problem you do not control; the second is an architecture problem you do.

The pattern

The structure:

  1. Generation — the model produces output: a proposed action, a draft, a suggested command, a recommendation. This component is manipulable, and you treat it as such. It proposes; it does not dispose.
  2. The gate — between generation and actuation sits a check that is not the model. It validates the proposed action against policy, or routes it to a human for authorisation, or both — scaled to the consequence of the action. Critically, the gate’s authorisation is bound to the specific content it approved.
  3. Actuation — the component that actually does the thing (sends the email, runs the command, modifies the system, calls the tool) is dumb. It does not generate, does not decide, does not interpret. It only executes what the gate authorised, and only the exact thing the gate authorised.

The manipulable component (generation) cannot reach the consequential component (actuation) except through the gate. Hostile influence can shape a proposal, which the gate then catches. It cannot shape an action, because actions only come from the gate, and the gate is not manipulable by the model’s output.

Why content-binding matters

The subtle, essential detail: the gate’s authorisation must bind to the specific content it approved — a hash, a precise reference, something that ties “approved” to “this exact thing and nothing else.”

Without binding, you have a time-of-check-to-time-of-use gap. The gate approves a proposal; between approval and execution, the proposal could change (through a race, through further manipulation, through a bug), and the actuator would execute the changed version under the old authorisation. Binding closes this: the actuator will only execute content matching what was authorised. Approve thing-with-hash-X, and only payload-with-hash-X executes. Anything else fails to validate and nothing happens.

This is what turns the gate from a rubber stamp into a real control. A human (or policy) does not approve “an action in general” — they approve this specific action, and the binding guarantees that specific action is what runs. It makes the authorisation precise rather than nominal.

How much of the threat landscape this covers

Walk the LLM risks against this pattern and watch how many it touches:

Prompt injection — injection’s danger is hostile data influencing the model into doing something. With generation/actuation separation, injection can influence a proposal, which the gate catches before any action. The injection’s payload never reaches an actuator. Largely neutralised architecturally.

Insecure tool / plugin design — a tool the model can call is an actuator. Put every consequential tool behind the gate (model proposes the call, something constrained validates, consequence is authorised) and “the model called a dangerous tool because it was manipulated” stops being possible. Directly addressed.

Excessive agency — this pattern is the answer to excessive agency: the model’s agency is reduced to proposing, and consequence requires passing the gate. The blast radius of a manipulated model shrinks to “it suggested something bad that got caught.”

Insecure output handling — output flowing into a consumer is a form of actuation. Treating output as a proposal to be validated before a downstream system trusts it is the same pattern applied to the output path.

Jailbreaking-to-harm — a jailbroken model that can only propose and not act has a sharply limited harm ceiling. Separation converts “jailbreak → consequence” into “jailbreak → a caught proposal.”

One architectural pattern meaningfully addresses five distinct risk categories. That is why it is the pattern: it operates at the structural level where many model-level threats converge on the same chokepoint — the boundary between thinking and doing.

What the pattern does not solve

Honesty about the limits:

The takeaway

If you build with LLMs and you remember one principle, make it this: assume the model is manipulable, and architect so that a manipulated model can only propose, never directly act — with a content-bound gate, human or policy, between proposal and action. This single structural pattern addresses prompt injection, insecure tools, excessive agency, output handling, and the harm-ceiling of jailbreaking, because it operates at the chokepoint where all of those converge: the boundary between the model thinking and the system doing.

The model’s own defences will always be imperfect. The architecture around it does not have to be. Separate generation from actuation, bind the authorisation to the content, and make the actuator too dumb to be talked into anything — and most of the LLM threat landscape stops being able to reach the parts that matter.


An independent piece by johlem.net — IT security, Luxembourg. LLM-security architecture, also documented at cyberramen.com.