Resources for production agents.

Plain-language explanations and technical notes from our work building data foundations, agent runtimes, MCP tools, evals, and closed-loop intelligence. Start anywhere — Learn for concepts, Use cases for patterns, Research for open questions, AI Index for the A-Z.

Learn

The concept library: source graphs, agent runtimes, MCP tools, workflow evals, governance, and closed-loop knowledge.

Browse articles

Use cases

Practical workflow patterns for data foundations, governed agents, approvals, eval dashboards, and knowledge loops.

Browse patterns

Index

An A-Z map of AI terms, capabilities, workflow patterns, and infrastructure language used across the stack.

Browse terms

Featured guide

Evals
before
launch.

Featured resource

Workflow Evals

The test suite your AI workflows have to pass before any change reaches users — measuring quality, latency, cost, and safety on real production data instead of vibes.

Read guide

Featured research

Synthetic
personality
that holds.

Research

Synthetic Personality

Before an AI agent can be useful to anyone, it has to be something — a coherent identity that holds up across users, sessions, and adversarial pressure. This is the research track that defines what that means and how to keep it stable.

Read research

Latest articles

What we are publishing around agent infrastructure, evals, source graphs, and closed-loop operations.

Agent runtime

Agent Runtime

The execution engine that turns an AI agent from a chat-window demo into a long-running, event-driven, restartable process you can trust with real operations.

Research notes

Conversation Intelligence

Turning every approved conversation — support, email, team chat, customer messaging, voice, sales — into structured signal you can act on, instead of anecdotes that evaporate when a ticket closes.

Data substrate

Source Graph

A navigable map of every system your data lives in — schemas, documents, code, tickets, events, owners, and permissions — so an AI agent can find the right source and respect the right access boundary.

Highlights

Protocols, control planes, and operating practices behind the systems we launch.

Agent runtime

Browse the library

A structured index for deeper concepts from the capability map.

Data substrate Agent runtime Evaluation Operations Research notes Capability map

Data substrate

Source Graph

Vector Search

How an AI agent finds the right document, chunk, or row to ground its answer in — and why the part that matters is the pipeline around the database, not the database itself.

LLM-Ready Knowledge Base

A company knowledge base built so an AI system can cite real answers from it — sourced from documents, tickets, code, conversations, and structured records; chunked, embedded, permissioned, evaluated, and kept fresh on AWS.

Lakehouse Architecture

An architecture that combines data-lake economics (cheap object storage, open file formats) with warehouse guarantees (ACID transactions, schema evolution, time travel) so analytics, AI retrieval, and machine learning all read from the same trusted tables.

Ingestion Contracts

Explicit agreements between a data source and the systems that depend on it — what shape, how fresh, who owns it, what counts as broken — so pipeline failures become attributable instead of mysterious.

Data Quality

Contracts, validation, lineage, freshness, and ownership for the data your AI reads from — not a one-time cleanup project, an ongoing operating discipline.

Agent runtime

Agent Runtime

The execution engine that turns an AI agent from a chat-window demo into a long-running, event-driven, restartable process you can trust with real operations.

MCP Tool Registry

A governed catalog of every tool an AI agent can call — your APIs, your databases, your internal systems — with typed schemas, permission scopes, audit trails, and the standard protocol (MCP) that turns 'we exposed it to the LLM' into 'we know exactly who called what when'.

Human Approval

Approval gates that put a human in the loop where correctness, risk, or accountability actually require human judgment — designed as part of the workflow, not as a panic button bolted on after launch.

Workflow Runtime

The engine that runs an AI agent workflow as a durable, observable, restartable process instead of a one-shot script — what separates an agent demo from an agent deployment.

Evaluation

Workflow Evals

The test suite your AI workflows have to pass before any change reaches users — measuring quality, latency, cost, and safety on real production data instead of vibes.

Self-Optimizing Agents

AI workflows that propose, score, and promote their own variants — prompts, models, retrieval policies, tool budgets, generated code — under measurable constraints instead of intuition or vendor leaderboards.

Prompt and Model Diffs

Side-by-side measurement of a candidate prompt or model against the current production version on the same eval set — the unit of safe change in a serious AI workflow.

Promotion Gates

The thresholds an AI change must clear before it reaches production — quality, latency, cost, memory, safety — enforced by CI, not by hope.

Operations

Model Routing

How an AI system decides which model to call for each step — based on privacy, cost, latency, quality, and what happens when a provider goes down.

Agent Observability

Trace-level visibility into every model call, retrieval, tool invocation, decision, approval, and failure inside an AI workflow — the substrate every other discipline (evals, optimization, governance) reads from.

Governance

The policy layer for what an AI system is allowed to read, call, decide, and ship — encoded as configuration the runtime enforces, not as a document on a shared drive.

Research notes

Conversation Intelligence

Chat Orchestration Runtime

The end-to-end architecture of modern conversational AI systems: model-agnostic, client-agnostic, plugin-driven runtimes that coordinate intent, context, retrieval, tools, reasoning, reflection, memory, and rendering — with the LLM as one interchangeable component, not the system.

AI-Native Dashboards

A study on conversational, adaptive, living dashboard interfaces — workspaces that begin as a blank canvas with a single conversational input and build themselves in real time as the user asks, persisting widgets, layouts, and memory across sessions.

Prompt-Native Widgets

Generative, context-aware dashboard components whose logic and rendering are defined by natural language prompts rather than hardcoded configurations — runtime-generated analytical surfaces that retrieve, reason, link, and adapt instead of merely displaying.

Production Agent Interfaces

The chat surface as an operating console — knowledge bases plugged in, tools connected, agents on a roster, with real-time visibility into context budget, token spend, model choice, and concrete savings opportunities. The interface that lets a team actually run an agent in production, not just demo one.

Conversation Listeners

Opt-in listeners that capture conversations from every channel an organization uses — support, email, team chat, customer messaging, webchat, sales tools, voice — and route them into the signal-extraction pipeline with consent and retention rules attached.

Signal Extraction

Turning raw conversation transcripts into structured fields — intent, subject, sentiment, CSAT, tool performance, product mentions — that downstream systems can query, dashboard, and act on.

Conversation Forensics

Incident detection and root-cause analysis on human↔agent conversations — replaying threads, reading the context around negative sentiment, extracting whether the user actually resolved their problem, and turning the answer into a learning artifact the system can use next time.

Capability map

Synthetic Personality

Agent Memory

How an AI agent remembers the user it serves — what they said before, what they prefer, what context not to repeat — without that memory drifting the agent's behavior for everyone else.

Skill Distillation

How an educated AI agent — the one with the codebase, the tools, and the tacit context that lets it succeed — distills its competence into transferable skill documents a virgin agent can run from scratch.

Closed-Loop Knowledge

How an AI system gets durably better at its job — not by being smarter, but by routing every production failure into either a knowledge update, an eval case, a workflow patch, or a documented exception with a named owner.