Use cases

Where the capability map becomes workflow.

Concrete patterns for turning data infrastructure and agent primitives into production outcomes with owners, boundaries, traces, and measurable thresholds.

Use case

Workflow Mutation

Generating workflow variants — different prompts, models, retrieval policies, node shapes, tool budgets — for the eval harness to score before any change reaches production.

View pattern

Use case

Approval Inbox

A reviewable inbox of pending agent actions — context-rich, SLA-bound, routable — where high-impact workflows go for human signoff before execution.

View pattern

Use case

Slack to Knowledge

Resolved high-quality Slack threads converted into knowledge-base updates — with consent, attribution, and approval by the channel owner.

View pattern

Data substrate

Storage, contracts, lineage, quality, and retrieval readiness.

AI-Ready Storage

Object storage and open table formats organized so analytics, retrieval, and training read from the same governed substrate.

Governed Datasets

Datasets with declared owners, access boundaries, freshness SLOs, and retention rules — the unit of agent-safe data sharing.

Analytics Workloads

BI and ad-hoc analytical queries running on the same lakehouse that powers retrieval and agents — without a separate warehouse copy.

Event Stream Ingestion

Kafka, Kinesis, or Pulsar streams landed in the lakehouse with exactly-once semantics, schema enforcement, and per-stream contracts.

Batch Pipeline Modernization

Migrating ad-hoc cron jobs, SQL scripts, and orchestration scripts into a versioned, tested, observable batch substrate.

Source Contracts

Per-source agreements declaring shape, freshness, owner, error budget, and breakage policy — the discipline that makes pipeline failures attributable.

Lineage Mapping

Column-level and table-level lineage from source to consumer — the substrate every impact analysis, deprecation, and audit eventually needs.

Data Quality Gates

Validation checks that block bad data from propagating downstream — at ingestion, at transformation, and at publication.

Retrieval Readiness

Tables, documents, and indexes prepared for agent retrieval — chunked, embedded, metadata-tagged, permission-resolved, and tested.

Agent runtime

Routing, tools, approvals, replay, and governed execution.

Model Routing Policy

Per-step routing rules across providers and models — privacy, cost, latency, quality, fallback — declared in the gateway, not buried in workflow code.

Private Inference

On-premise or VPC-bound model deployments for sensitive workloads — open-weight models served on hardware you control, with the same gateway and governance as cloud routes.

Model Fallback Strategy

Declarative fallback chains for provider outages, rate limits, and quality regressions — defined per workflow, not coded ad-hoc per call site.

Tool Permissions

Per-tool, per-agent, per-tenant permission scopes enforced by the MCP registry — the boundary between 'an agent can call X' and 'an agent will call X with these credentials'.

Tool Schema Contracts

JSON Schema input and output contracts for every MCP tool — versioned, validated, and used by both the registry and the eval set.

Tool Audit Trails

Per-invocation logs of what tool was called by what agent on behalf of what principal with what inputs and what result — the substrate every audit eventually reads from.

Hybrid Retrieval

BM25 keyword search and dense vector search combined with a tunable weight, plus cross-encoder rerank on the top-k — the configuration that consistently beats vector-only.

Citation Quality

Agents that cite sources accurately, attribute claims to the chunk they came from, and report when a claim has no source — the foundation of trustworthy output.

Reranking Policy

How many candidates to rerank, with which model, against which signals — tuned per workflow against the eval set instead of copied from a tutorial.

Support Triage

Inbound support tickets classified, routed, and partially resolved by an agent with permission-scoped access to account context and the support knowledge base.

Owner Detection

Identifying the right human or team to route a request to — by tenant, region, product area, expertise, on-call, or load — instead of dropping it in a shared queue.

Escalation Policy

Declared rules for when an agent stops trying and hands off to a human — confidence floor, risk class, retry count, SLA pressure — encoded so escalation is consistent across the team.

Issue to PR

An engineering workflow that takes an issue, gathers context across the codebase and traces, and proposes a pull request — or a runbook step when code change is not the right move.

Tool Execution

Reliable invocation of MCP tools with timeouts, retries, idempotency, and trace propagation — the boring infrastructure that makes agents trustworthy.

Trace Replay

Deterministic replay of a workflow run from its stored trace — the debugging primitive that makes 'why did the agent do that' a question with an answer.

Risk Review

Pre-deployment review of workflow changes against risk class — what data is touched, what tools are invoked, what side effects are possible — beyond functional review.

Approval Inbox

A reviewable inbox of pending agent actions — context-rich, SLA-bound, routable — where high-impact workflows go for human signoff before execution.

Correction Capture

Capturing human corrections to agent output — text edits, action changes, reroutes — and converting them into eval cases, knowledge updates, or workflow changes.

Workflow Mutation

Generating workflow variants — different prompts, models, retrieval policies, node shapes, tool budgets — for the eval harness to score before any change reaches production.

Evals and optimization

Mutation, generated-code tests, regression datasets, gates, and deploy safety.

Generated-Code Tests

Test coverage for the agent-generated code that lives inside workflows — handlers, transformers, validators — so the eval set catches regressions in code, not just prompts.

Regression Datasets

Curated eval cases drawn from production failures — every fixed bug becomes a regression that future changes must clear.

Cost Reduction

Eval-harness-driven sweeps of models, prompts, retrieval depth, and tool budgets to find the cheapest configuration that still passes quality gates.

Quality Gates

Hard floors on the quality metrics a candidate must clear before promotion — encoded so a change cannot ship to production by accident.

Latency Budgets

Per-workflow p95 and p99 latency targets enforced as promotion gates and tracked in production — so 'feels slow' becomes a quantified, attributable signal.

Safe Deploys

Canary, staged rollout, and rollback for agent workflow changes — the deployment primitives every other engineering team has, applied to LLM and prompt changes.

Eval Dashboards

Operational dashboards that show the eval-set score, regression count, latency, and cost over time — the visible state of the team's quality discipline.

Drift Alerts

Alerts when production behavior diverges from eval-set expectations — a model deprecation, a prompt change in upstream library, or a corpus that has quietly aged.

Slack to Knowledge

Resolved high-quality Slack threads converted into knowledge-base updates — with consent, attribution, and approval by the channel owner.

Closed-loop intelligence

Conversation signal, root-cause analysis, knowledge updates, and review queues.

Support Thread Analysis

Aggregated analysis of support tickets — intent, subject, sentiment, resolution time, citations — feeding product, engineering, and ops with the same source of truth.

Webchat Signal

Pre-sale, on-site, and in-product webchat captured into the same signal layer as support — intent, friction, dropoff, mention.

Intent Routing

The intelligence layer that helps AI understand what people actually want.

Sentiment Trends

Trend lines on sentiment, satisfaction, and dissatisfaction signals across channels and cohorts — calibrated against a CSAT or NPS baseline so the curve means something.

Product Signal

Feature requests, missing capabilities, competitor mentions, and 'wish-it-did-X' patterns extracted from real conversations and routed to product.

Bad Thread Clusters

Clusters of failed or low-quality threads, grouped by failure pattern, with example threads and a recommended owner — the unit of investment in conversation intelligence.

Root-Cause Analysis

Diagnosing whether a cluster of bad threads is retrieval, reasoning, tool, knowledge, or policy — and routing the fix to the right surface.

Resolution QA

Sampling and scoring resolved threads — agent-handled and human-handled — to calibrate quality, detect bad resolutions, and feed eval set additions.

Knowledge Updates

Resolved threads, corrections, and approved drafts flowing back into the knowledge base — versioned, attributed, and reviewable.

Eval Case Capture

Hard or high-value conversations promoted into the eval set — gold examples to preserve and regressions to prevent.

Human Review Queue

A unified queue for human review across approvals, low-confidence outputs, escalations, and flagged corrections — with SLAs and load balancing.