Guardrails
The safety checks and policy enforcements that sit around an AI agent's inputs and outputs — content filters, scope enforcers, PII redactors, refusal patterns, and tool-call validators.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What it is
Guardrails are the layer that prevents an AI agent from doing things it shouldn't: leaking PII, calling out-of-scope tools, generating prohibited content, taking actions on data the user can't see, exceeding cost budgets. They run before the model (input guardrails), after the model (output guardrails), and around tool calls.
Why it matters
A model trained to be helpful will sometimes be helpful in the wrong direction. Guardrails are the deterministic layer around the non-deterministic model that catches the wrong direction before it ships. Without them, every safety property depends on the model behaving — which is not a foundation production teams accept.
How it works
Input guardrails (PII detection, prompt-injection screening, scope checks) gate what reaches the model. Output guardrails (content classifiers, citation verification, structured-format validators) gate what leaves. Tool-call guardrails (scope enforcement, parameter validation) gate what the agent does. Open-source toolkits like NVIDIA NeMo Guardrails and Guardrails AI provide common patterns; bespoke guardrails are written per workflow.
Related resources
The policy layer for what an AI system is allowed to read, call, decide, and ship — encoded as configuration the runtime enforces, not as a document on a shared drive.
A point in an AI workflow where an action is suspended until a human reviews and approves, rejects, or modifies it.
A governed catalog of every tool an AI agent can call — your APIs, your databases, your internal systems — with typed schemas, permission scopes, audit trails, and the standard protocol (MCP) that turns 'we exposed it to the LLM' into 'we know exactly who called what when'.
The versioned, tested set of rules and templates that govern how prompts are assembled for an AI workflow — instructions, examples, formatting, refusal patterns, escalation language.
Scoring AI workflow traces — not just final outputs — to detect quality issues at the step level: bad retrievals, wrong tool calls, low-confidence reasoning.