Lakehouse
An architecture that combines cheap object storage with warehouse-grade table guarantees — the substrate where analytics, AI retrieval, and ML training read from the same governed tables.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What it is
A lakehouse is data-lake economics with data-warehouse correctness. Files stored as Parquet on cheap object storage (S3, GCS, Azure Blob), managed by an open table format (Apache Iceberg, Delta Lake, Apache Hudi) that adds ACID transactions, schema evolution, and time travel. One substrate, queried by many engines.
Why it matters
Before lakehouses, organizations duplicated their data: a warehouse for analytics, a separate lake for ML, a third copy for AI. The copies drifted. The numbers disagreed. The lakehouse removes the duplicates — analytics and AI read from the same trusted tables.
How it works
See the dedicated Lakehouse Architecture article for the full mechanism.
Related resources
An architecture that combines data-lake economics (cheap object storage, open file formats) with warehouse guarantees (ACID transactions, schema evolution, time travel) so analytics, AI retrieval, and machine learning all read from the same trusted tables.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.
Contracts, validation, lineage, freshness, and ownership for the data your AI reads from — not a one-time cleanup project, an ongoing operating discipline.