Use cases

Where the capability map becomes workflow.

Concrete patterns for turning data infrastructure and agent primitives into production outcomes with owners, boundaries, traces, and measurable thresholds.

Data substrate

Storage, contracts, lineage, quality, and retrieval readiness.

Agent runtime

Routing, tools, approvals, replay, and governed execution.

Model Routing Policy

Per-step routing rules across providers and models — privacy, cost, latency, quality, fallback — declared in the gateway, not buried in workflow code.

Private Inference

On-premise or VPC-bound model deployments for sensitive workloads — open-weight models served on hardware you control, with the same gateway and governance as cloud routes.

Model Fallback Strategy

Declarative fallback chains for provider outages, rate limits, and quality regressions — defined per workflow, not coded ad-hoc per call site.

Tool Permissions

Per-tool, per-agent, per-tenant permission scopes enforced by the MCP registry — the boundary between 'an agent can call X' and 'an agent will call X with these credentials'.

Tool Schema Contracts

JSON Schema input and output contracts for every MCP tool — versioned, validated, and used by both the registry and the eval set.

Tool Audit Trails

Per-invocation logs of what tool was called by what agent on behalf of what principal with what inputs and what result — the substrate every audit eventually reads from.

Hybrid Retrieval

BM25 keyword search and dense vector search combined with a tunable weight, plus cross-encoder rerank on the top-k — the configuration that consistently beats vector-only.

Citation Quality

Agents that cite sources accurately, attribute claims to the chunk they came from, and report when a claim has no source — the foundation of trustworthy output.

Reranking Policy

How many candidates to rerank, with which model, against which signals — tuned per workflow against the eval set instead of copied from a tutorial.

Support Triage

Inbound support tickets classified, routed, and partially resolved by an agent with permission-scoped access to account context and the support knowledge base.

Owner Detection

Identifying the right human or team to route a request to — by tenant, region, product area, expertise, on-call, or load — instead of dropping it in a shared queue.

Escalation Policy

Declared rules for when an agent stops trying and hands off to a human — confidence floor, risk class, retry count, SLA pressure — encoded so escalation is consistent across the team.

Issue to PR

An engineering workflow that takes an issue, gathers context across the codebase and traces, and proposes a pull request — or a runbook step when code change is not the right move.

Tool Execution

Reliable invocation of MCP tools with timeouts, retries, idempotency, and trace propagation — the boring infrastructure that makes agents trustworthy.

Trace Replay

Deterministic replay of a workflow run from its stored trace — the debugging primitive that makes 'why did the agent do that' a question with an answer.

Risk Review

Pre-deployment review of workflow changes against risk class — what data is touched, what tools are invoked, what side effects are possible — beyond functional review.

Approval Inbox

A reviewable inbox of pending agent actions — context-rich, SLA-bound, routable — where high-impact workflows go for human signoff before execution.

Correction Capture

Capturing human corrections to agent output — text edits, action changes, reroutes — and converting them into eval cases, knowledge updates, or workflow changes.

Workflow Mutation

Generating workflow variants — different prompts, models, retrieval policies, node shapes, tool budgets — for the eval harness to score before any change reaches production.

Evals and optimization

Mutation, generated-code tests, regression datasets, gates, and deploy safety.

Closed-loop intelligence

Conversation signal, root-cause analysis, knowledge updates, and review queues.

Support Thread Analysis

Aggregated analysis of support tickets — intent, subject, sentiment, resolution time, citations — feeding product, engineering, and ops with the same source of truth.

Webchat Signal

Pre-sale, on-site, and in-product webchat captured into the same signal layer as support — intent, friction, dropoff, mention.

Intent Routing

The intelligence layer that helps AI understand what people actually want.

Sentiment Trends

Trend lines on sentiment, satisfaction, and dissatisfaction signals across channels and cohorts — calibrated against a CSAT or NPS baseline so the curve means something.

Product Signal

Feature requests, missing capabilities, competitor mentions, and 'wish-it-did-X' patterns extracted from real conversations and routed to product.

Bad Thread Clusters

Clusters of failed or low-quality threads, grouped by failure pattern, with example threads and a recommended owner — the unit of investment in conversation intelligence.

Root-Cause Analysis

Diagnosing whether a cluster of bad threads is retrieval, reasoning, tool, knowledge, or policy — and routing the fix to the right surface.

Resolution QA

Sampling and scoring resolved threads — agent-handled and human-handled — to calibrate quality, detect bad resolutions, and feed eval set additions.

Knowledge Updates

Resolved threads, corrections, and approved drafts flowing back into the knowledge base — versioned, attributed, and reviewable.

Eval Case Capture

Hard or high-value conversations promoted into the eval set — gold examples to preserve and regressions to prevent.

Human Review Queue

A unified queue for human review across approvals, low-confidence outputs, escalations, and flagged corrections — with SLAs and load balancing.