blog.johlem.net

Phishing Takedown: The Operational Reality

Phishing takedown looks simple from the outside: find the bad site, email the host’s abuse contact, site goes away. Anyone who has run it at scale knows the detection is the easy part. The operational reality is a chain of less glamorous problems — verifying who to actually contact, drafting something a registrar will act on, and above all building a pipeline that automates the tedium without ever automating away the human judgment that keeps it safe.

This is a methodology piece, not a product. It describes how a takedown pipeline should be architected — particularly the parts that matter when the drafting is assisted by an LLM, where the failure modes are subtle and the consequences of getting them wrong are real.

The lifecycle, honestly

A takedown is a pipeline with five stages, and the difficulty is unevenly distributed:

  1. Detect — identify the phishing URL/infrastructure. (The easy part.)
  2. Verify the abuse contact — determine who can actually action a takedown for this asset. (Harder than it looks.)
  3. Draft — produce a report the recipient will act on. (Easy to do badly.)
  4. Approve — a human authorises the specific outbound action. (The safety-critical gate.)
  5. Send — deliver the report. (Must be dumb and unforgeable.)

Most of the value — and all of the risk — is in stages 2, 4, and 5.

Stage 2: abuse-contact verification is the real work

You cannot send a takedown to the wrong party, and you especially cannot let an automated system invent a contact. The discipline here is single and strict: abuse contacts come only from authoritative sources — RDAP and IANA — never from anything an LLM generated, inferred, or scraped.

Why this rigidity matters: an LLM drafting a report is perfectly capable of confidently producing a plausible-looking abuse address that does not exist, or worse, belongs to an uninvolved third party. If the contact resolution is left to a generative step, you have built a system that can spam innocent parties with abuse reports at scale and with great confidence. That is not a takedown pipeline; it is a liability generator.

The rule: contact resolution is a deterministic lookup (RDAP, IANA registries), never a generative inference. The LLM may help draft the body. It must never decide the recipient.

Stage 4: the approval gate is the whole safety model

This is the architectural heart, and it is worth being precise. The pattern: detect → draft → notify a human → human issues an APPROVE token → a non-LLM daemon validates the token → an isolated sender delivers. Nothing leaves the system without a human authorising that specific action.

The token is not a rubber stamp. It binds to the specific case and the specific content — a case ID and a hash of the exact draft being approved. This matters for a reason that only becomes obvious once you think adversarially: it prevents a time-of-check-to-time-of-use problem where the thing approved and the thing sent diverge. The human approves draft-with-hash-X; the daemon will only send a payload whose hash is X. If anything changed between approval and send, the token does not validate and nothing goes out.

This gate is what makes the whole pipeline safe to automate. Everything before it can be assisted, generated, suggested. The gate is the point where a human takes responsibility for an irreversible outbound action, and the binding makes that responsibility precise rather than nominal.

Stage 5: the sender must be incapable of judgment

The final design principle is counterintuitive: the sender should be the dumbest component in the system. No LLM. No shared credentials with the generative parts. No ability to make decisions. It does exactly one thing: validate an APPROVE token against a case-id and draft-hash, and if valid, deliver that exact payload.

The reason is prompt-injection resistance. The phishing content you are reporting on is hostile input. A phishing page is, by definition, content crafted to manipulate. If your drafting LLM ingests that page and the sender is downstream of the LLM with the ability to act, you have created a path where hostile content could attempt to influence outbound actions. By making the sender a non-LLM daemon that only honours cryptographically-bound human approvals, you sever that path entirely. Hostile input can influence a draft (which a human then reviews), but it can never reach an actuator, because the actuator only obeys tokens, not text.

Supporting properties that follow from this stance:

The design principle underneath all of it

Every choice above flows from one idea: separate the parts that can be manipulated from the parts that can act. The LLM can be manipulated (it ingests hostile content), so it must not act. The sender can act, so it must not be manipulable (no LLM, only token validation). The human bridges them, and the token binding makes that bridge precise.

This is the same principle that governs any system mixing generative assistance with real-world consequences: generation and actuation must be separated by a human-authorised, content-bound gate, and the actuator must be too simple to be talked into anything.

The takedown is, in the end, a small instance of a large problem — how to get the leverage of automation without inheriting the risk of automation acting on hostile input. Solve it here and the pattern transfers to every other “detect → draft → send” workflow you will ever build.

What this is not

Deliberately, this piece describes the architecture and the reasoning, not a runnable pipeline. The safety of a takedown system lives in its design discipline — verified contacts, a content-bound human gate, a non-actuating LLM, an isolated dumb sender. Those are the parts worth understanding. The implementation details are an exercise for a team that has internalised why each constraint exists, because a takedown pipeline built without understanding the constraints is more dangerous than no pipeline at all.


An independent piece by johlem.net — IT security consulting, Luxembourg. Phishing detection and abuse-takedown methodology for regulated financial clients.