Operations

Guardrails

The safety checks and policy enforcements that sit around an AI agent's inputs and outputs — content filters, scope enforcers, PII redactors, refusal patterns, and tool-call validators.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Guardrails are the layer that prevents an AI agent from doing things it shouldn't: leaking PII, calling out-of-scope tools, generating prohibited content, taking actions on data the user can't see, exceeding cost budgets. They run before the model (input guardrails), after the model (output guardrails), and around tool calls.

Why it matters

A model trained to be helpful will sometimes be helpful in the wrong direction. Guardrails are the deterministic layer around the non-deterministic model that catches the wrong direction before it ships. Without them, every safety property depends on the model behaving — which is not a foundation production teams accept.

How it works

Input guardrails (PII detection, prompt-injection screening, scope checks) gate what reaches the model. Output guardrails (content classifiers, citation verification, structured-format validators) gate what leaves. Tool-call guardrails (scope enforcement, parameter validation) gate what the agent does. Open-source toolkits like NVIDIA NeMo Guardrails and Guardrails AI provide common patterns; bespoke guardrails are written per workflow.

Related resources