Where the capability map becomes workflow.
Concrete patterns for turning data infrastructure and agent primitives into production outcomes with owners, boundaries, traces, and measurable thresholds.
Workflow Mutation
Generating workflow variants — different prompts, models, retrieval policies, node shapes, tool budgets — for the eval harness to score before any change reaches production.
Approval Inbox
A reviewable inbox of pending agent actions — context-rich, SLA-bound, routable — where high-impact workflows go for human signoff before execution.
Slack to Knowledge
Resolved high-quality Slack threads converted into knowledge-base updates — with consent, attribution, and approval by the channel owner.
Data substrate
Storage, contracts, lineage, quality, and retrieval readiness.
AI-Ready Storage
Object storage and open table formats organized so analytics, retrieval, and training read from the same governed substrate.
Governed Datasets
Datasets with declared owners, access boundaries, freshness SLOs, and retention rules — the unit of agent-safe data sharing.
Analytics Workloads
BI and ad-hoc analytical queries running on the same lakehouse that powers retrieval and agents — without a separate warehouse copy.
Event Stream Ingestion
Kafka, Kinesis, or Pulsar streams landed in the lakehouse with exactly-once semantics, schema enforcement, and per-stream contracts.
Batch Pipeline Modernization
Migrating ad-hoc cron jobs, SQL scripts, and orchestration scripts into a versioned, tested, observable batch substrate.
Source Contracts
Per-source agreements declaring shape, freshness, owner, error budget, and breakage policy — the discipline that makes pipeline failures attributable.
Lineage Mapping
Column-level and table-level lineage from source to consumer — the substrate every impact analysis, deprecation, and audit eventually needs.
Data Quality Gates
Validation checks that block bad data from propagating downstream — at ingestion, at transformation, and at publication.
Retrieval Readiness
Tables, documents, and indexes prepared for agent retrieval — chunked, embedded, metadata-tagged, permission-resolved, and tested.
Agent runtime
Routing, tools, approvals, replay, and governed execution.
Model Routing Policy
Per-step routing rules across providers and models — privacy, cost, latency, quality, fallback — declared in the gateway, not buried in workflow code.
Private Inference
On-premise or VPC-bound model deployments for sensitive workloads — open-weight models served on hardware you control, with the same gateway and governance as cloud routes.
Model Fallback Strategy
Declarative fallback chains for provider outages, rate limits, and quality regressions — defined per workflow, not coded ad-hoc per call site.
Tool Permissions
Per-tool, per-agent, per-tenant permission scopes enforced by the MCP registry — the boundary between 'an agent can call X' and 'an agent will call X with these credentials'.
Tool Schema Contracts
JSON Schema input and output contracts for every MCP tool — versioned, validated, and used by both the registry and the eval set.
Tool Audit Trails
Per-invocation logs of what tool was called by what agent on behalf of what principal with what inputs and what result — the substrate every audit eventually reads from.
Hybrid Retrieval
BM25 keyword search and dense vector search combined with a tunable weight, plus cross-encoder rerank on the top-k — the configuration that consistently beats vector-only.
Citation Quality
Agents that cite sources accurately, attribute claims to the chunk they came from, and report when a claim has no source — the foundation of trustworthy output.
Reranking Policy
How many candidates to rerank, with which model, against which signals — tuned per workflow against the eval set instead of copied from a tutorial.
Support Triage
Inbound support tickets classified, routed, and partially resolved by an agent with permission-scoped access to account context and the support knowledge base.
Owner Detection
Identifying the right human or team to route a request to — by tenant, region, product area, expertise, on-call, or load — instead of dropping it in a shared queue.
Escalation Policy
Declared rules for when an agent stops trying and hands off to a human — confidence floor, risk class, retry count, SLA pressure — encoded so escalation is consistent across the team.
Issue to PR
An engineering workflow that takes an issue, gathers context across the codebase and traces, and proposes a pull request — or a runbook step when code change is not the right move.
Tool Execution
Reliable invocation of MCP tools with timeouts, retries, idempotency, and trace propagation — the boring infrastructure that makes agents trustworthy.
Trace Replay
Deterministic replay of a workflow run from its stored trace — the debugging primitive that makes 'why did the agent do that' a question with an answer.
Risk Review
Pre-deployment review of workflow changes against risk class — what data is touched, what tools are invoked, what side effects are possible — beyond functional review.
Approval Inbox
A reviewable inbox of pending agent actions — context-rich, SLA-bound, routable — where high-impact workflows go for human signoff before execution.
Correction Capture
Capturing human corrections to agent output — text edits, action changes, reroutes — and converting them into eval cases, knowledge updates, or workflow changes.
Workflow Mutation
Generating workflow variants — different prompts, models, retrieval policies, node shapes, tool budgets — for the eval harness to score before any change reaches production.
Evals and optimization
Mutation, generated-code tests, regression datasets, gates, and deploy safety.
Generated-Code Tests
Test coverage for the agent-generated code that lives inside workflows — handlers, transformers, validators — so the eval set catches regressions in code, not just prompts.
Regression Datasets
Curated eval cases drawn from production failures — every fixed bug becomes a regression that future changes must clear.
Cost Reduction
Eval-harness-driven sweeps of models, prompts, retrieval depth, and tool budgets to find the cheapest configuration that still passes quality gates.
Quality Gates
Hard floors on the quality metrics a candidate must clear before promotion — encoded so a change cannot ship to production by accident.
Latency Budgets
Per-workflow p95 and p99 latency targets enforced as promotion gates and tracked in production — so 'feels slow' becomes a quantified, attributable signal.
Safe Deploys
Canary, staged rollout, and rollback for agent workflow changes — the deployment primitives every other engineering team has, applied to LLM and prompt changes.
Eval Dashboards
Operational dashboards that show the eval-set score, regression count, latency, and cost over time — the visible state of the team's quality discipline.
Drift Alerts
Alerts when production behavior diverges from eval-set expectations — a model deprecation, a prompt change in upstream library, or a corpus that has quietly aged.
Slack to Knowledge
Resolved high-quality Slack threads converted into knowledge-base updates — with consent, attribution, and approval by the channel owner.
Closed-loop intelligence
Conversation signal, root-cause analysis, knowledge updates, and review queues.
Support Thread Analysis
Aggregated analysis of support tickets — intent, subject, sentiment, resolution time, citations — feeding product, engineering, and ops with the same source of truth.
Webchat Signal
Pre-sale, on-site, and in-product webchat captured into the same signal layer as support — intent, friction, dropoff, mention.
Intent Routing
The intelligence layer that helps AI understand what people actually want.
Sentiment Trends
Trend lines on sentiment, satisfaction, and dissatisfaction signals across channels and cohorts — calibrated against a CSAT or NPS baseline so the curve means something.
Product Signal
Feature requests, missing capabilities, competitor mentions, and 'wish-it-did-X' patterns extracted from real conversations and routed to product.
Bad Thread Clusters
Clusters of failed or low-quality threads, grouped by failure pattern, with example threads and a recommended owner — the unit of investment in conversation intelligence.
Root-Cause Analysis
Diagnosing whether a cluster of bad threads is retrieval, reasoning, tool, knowledge, or policy — and routing the fix to the right surface.
Resolution QA
Sampling and scoring resolved threads — agent-handled and human-handled — to calibrate quality, detect bad resolutions, and feed eval set additions.
Knowledge Updates
Resolved threads, corrections, and approved drafts flowing back into the knowledge base — versioned, attributed, and reviewable.
Eval Case Capture
Hard or high-value conversations promoted into the eval set — gold examples to preserve and regressions to prevent.
Human Review Queue
A unified queue for human review across approvals, low-confidence outputs, escalations, and flagged corrections — with SLAs and load balancing.