← Blog

Your Guardrails Don't Govern Anything

The AI industry has decided that guardrails are governance. Content filters. Prompt shields. Output classifiers. These are useful tools. They are not governance. They operate at a different boundary, answer a different question, and produce a different kind of evidence.

Two boundaries, two problems

Guardrails operate at the model boundary — evaluating what the model says. The prompt going in, the completion coming out. They catch toxic content, jailbreak attempts, off-topic responses, and hallucinated claims. This is content safety. It is necessary work.

Governance operates at the action boundary — evaluating what the system does. The tool invocation, the API call, the state change. It asks a different question: was this specific action authorised under a specific policy, by a specific delegation, at this specific time? And can you prove it?

These are not competing answers to the same question. They are answers to different questions, at different points in the execution path.

The handoff

An autonomous AI system reads customer financial records and drafts a summary email. A guardrail evaluates the model’s output: no toxic language, no hallucinated data, no prompt injection artefacts. Every content filter passes.

The system then invokes a tool that sends the email — containing those financial records — to an external address.

The guardrail is no longer in the control path. It evaluated the content of the message. It did not evaluate the action of sending sensitive data externally. It was not asked whether this action was authorised. It has no opinion on delegation scope, time bounds, or policy constraints. It produced no evidence of an authorisation decision.

This is not a failure of the guardrail. It is a category error.

Content safety is not action authority

Content safety evaluates text. Action authority evaluates consequence.

Content safety operates during generation. Action authority operates before execution.

Content safety produces a pass or fail on the model’s output. Action authority produces a decision — ALLOW, DENY, or ESCALATE — on a specific action, with a tamper-evident receipt that records the policy and delegation under which the decision was made.

Treating guardrails as governance is like treating spell-check as legal review. Both examine documents. Only one establishes whether what the document authorises is legitimate.

What auditors will ask

When a regulator investigates an incident involving an autonomous AI system, they will not ask whether the model’s output passed a content filter. They will ask:

  • Was this action authorised before it executed?
  • Under what policy?
  • By what delegation chain?
  • Can you produce a tamper-evident record of the decision?
  • Can you prove that record has not been altered?

Guardrails cannot answer these questions. Not because they are poorly implemented, but because they operate at a different boundary. They see the model’s words. They do not see the system’s actions. They produce logs, not evidence. They evaluate content, not authority.

The distinction

Guardrails are content safety. Governance is action authority. The first evaluates what the model says. The second evaluates what the system does. Confusing the two is how organisations end up with comprehensive content filtering and zero governance over the consequential actions their autonomous systems execute every day.

Post-execution controls are not governance. They are accounting.