Ingestion Contracts
Explicit agreements between a data source and the systems that depend on it — what shape, how fresh, who owns it, what counts as broken — so pipeline failures become attributable instead of mysterious.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What a contract specifies
Schema (typed fields with semantics, not just SQL types); freshness SLO (data is at most N minutes old at 99% of measurements); ownership (the team or person on call for breakages); error budget (acceptable rate of late or malformed records before alerts fire); breakage policy (what happens when the contract is violated — dead letter, retry, hard fail).
- Typed schema with semantic metadata
- Freshness SLO with measurement method
- Named owner and on-call rotation
- Error budget and breakage policy
Why contracts beat best-effort
Without contracts, every breakage is a forensic investigation. With contracts, breakages route automatically to the responsible owner with the context they need. The downstream agent or dashboard learns to trust the data — and the freshness telemetry lets retrieval skip or downgrade sources that have silently gone stale.
Tooling
Schema Registry (Confluent / Apicurio) for streaming; dbt or SQLMesh tests for warehouse contracts; Soda or Great Expectations for validation; OpenLineage for downstream impact; PagerDuty or Opsgenie for the actual page when a contract is breached.
Related resources
An architecture that combines data-lake economics (cheap object storage, open file formats) with warehouse guarantees (ACID transactions, schema evolution, time travel) so analytics, AI retrieval, and machine learning all read from the same trusted tables.
Contracts, validation, lineage, freshness, and ownership for the data your AI reads from — not a one-time cleanup project, an ongoing operating discipline.
A navigable map of every system your data lives in — schemas, documents, code, tickets, events, owners, and permissions — so an AI agent can find the right source and respect the right access boundary.