Data substrate

RAG (Retrieval-Augmented Generation)

The pattern where an AI agent retrieves relevant context from your data before generating an answer — instead of relying only on what the model learned during training.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Retrieval-Augmented Generation, or RAG, is the most common pattern for grounding AI answers in your own data. Step one: take the user's question, search your indexed content for relevant chunks. Step two: send those chunks plus the question to the model and ask it to answer using them. The model gets to use your data; you get to control what it sees; the answer can be cited.

Why it matters

Without RAG, an AI assistant answers from the general knowledge baked into the model — which is incomplete, possibly outdated, and definitely not aware of your company. RAG is how 'AI that knows nothing about us' becomes 'AI that cites our docs.' Done well, it's table stakes for any internal AI application. Done badly, it's the source of most hallucinations.

How it works

Index your content (chunking, embeddings, metadata). At query time: retrieve candidates (hybrid BM25 + dense), rerank, filter by permissions, send top-k to the model with instructions to cite. Most production RAG bugs are at the retrieval step, not the generation step — which is why Vector Search is treated as a system, not a database call.

Related resources