Use case

Event Stream Ingestion

Kafka, Kinesis, or Pulsar streams landed in the lakehouse with exactly-once semantics, schema enforcement, and per-stream contracts.

Overview

Event streams power real-time agents, alerts, and CDC pipelines. Landing them in the lakehouse — without losing exactly-once guarantees and without silent schema drift — is the boring infrastructure that everything else depends on.

What it solves

Replaces fragile bespoke loaders with a contracted ingestion path: schema-enforced, idempotent, with a dead-letter table for bad records and a freshness SLO downstream consumers can rely on.

How we build it

Stream consumers (Flink, Spark Structured Streaming, Kafka Connect) write to Iceberg or Delta with exactly-once configuration. Schema Registry (Confluent or Apicurio) enforces the contract; rejected records go to a dead-letter table; freshness telemetry is wired to the same monitoring as the rest of the substrate.

  • Exactly-once configuration end to end
  • Schema Registry enforcement and evolution
  • Dead-letter tables with owner notifications
  • Freshness telemetry per stream

What changes when it is in place

Streams become first-class sources in the source graph, with the same observability and ownership as the rest of the data substrate.