Data substrate

Ingestion Contracts

Explicit agreements between a data source and the systems that depend on it — what shape, how fresh, who owns it, what counts as broken — so pipeline failures become attributable instead of mysterious.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What a contract specifies

Schema (typed fields with semantics, not just SQL types); freshness SLO (data is at most N minutes old at 99% of measurements); ownership (the team or person on call for breakages); error budget (acceptable rate of late or malformed records before alerts fire); breakage policy (what happens when the contract is violated — dead letter, retry, hard fail).

  • Typed schema with semantic metadata
  • Freshness SLO with measurement method
  • Named owner and on-call rotation
  • Error budget and breakage policy

Why contracts beat best-effort

Without contracts, every breakage is a forensic investigation. With contracts, breakages route automatically to the responsible owner with the context they need. The downstream agent or dashboard learns to trust the data — and the freshness telemetry lets retrieval skip or downgrade sources that have silently gone stale.

Tooling

Schema Registry (Confluent / Apicurio) for streaming; dbt or SQLMesh tests for warehouse contracts; Soda or Great Expectations for validation; OpenLineage for downstream impact; PagerDuty or Opsgenie for the actual page when a contract is breached.

Related resources