Operations

Guardrails

The safety checks and policy enforcements that sit around an AI agent's inputs and outputs — content filters, scope enforcers, PII redactors, refusal patterns, and tool-call validators.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Guardrails are the layer that prevents an AI agent from doing things it shouldn't: leaking PII, calling out-of-scope tools, generating prohibited content, taking actions on data the user can't see, exceeding cost budgets. They run before the model (input guardrails), after the model (output guardrails), and around tool calls.

Why it matters

A model trained to be helpful will sometimes be helpful in the wrong direction. Guardrails are the deterministic layer around the non-deterministic model that catches the wrong direction before it ships. Without them, every safety property depends on the model behaving — which is not a foundation production teams accept.

How it works

Input guardrails (PII detection, prompt-injection screening, scope checks) gate what reaches the model. Output guardrails (content classifiers, citation verification, structured-format validators) gate what leaves. Tool-call guardrails (scope enforcement, parameter validation) gate what the agent does. Open-source toolkits like NVIDIA NeMo Guardrails and Guardrails AI provide common patterns; bespoke guardrails are written per workflow.

Related resources

Governance

The policy layer for what an AI system is allowed to read, call, decide, and ship — encoded as configuration the runtime enforces, not as a document on a shared drive.

Approval Gate

A point in an AI workflow where an action is suspended until a human reviews and approves, rejects, or modifies it.

MCP Tool Registry

A governed catalog of every tool an AI agent can call — your APIs, your databases, your internal systems — with typed schemas, permission scopes, audit trails, and the standard protocol (MCP) that turns 'we exposed it to the LLM' into 'we know exactly who called what when'.

Prompt Policy

The versioned, tested set of rules and templates that govern how prompts are assembled for an AI workflow — instructions, examples, formatting, refusal patterns, escalation language.

Trace Grading

Scoring AI workflow traces — not just final outputs — to detect quality issues at the step level: bad retrievals, wrong tool calls, low-confidence reasoning.

Guardrails

What it is

Why it matters

How it works

Related concepts

Related resources