Quality happens at storage, not at search
The common fix when a knowledge base gives bad answers is to tune the retrieval. Change the embedding model. Adjust the chunking strategy. Add a reranker. These are real tools and they help.
What they cannot fix is a storage problem. If the artifact in the index is ambiguous, underspecified, or missing context that only exists in the head of the person who wrote it, no retrieval improvement will surface a good answer. The information is not there. It was never stored.
The quality of a knowledge base is determined at the moment content goes in. That is not an argument against good retrieval. It is an argument for investing in the right layer first.
Two problems that look like one
Retrieval failures and storage failures produce the same symptom: a bad answer. But only one of them is a retrieval problem.
A retrieval failure means the right content exists in the index but does not get surfaced. The embedding did not capture the meaning well. The chunk boundary cut the relevant sentence in half. The ranking put a less relevant result first. These are fixable with retrieval-layer work.
A storage failure means the content does not contain enough information to answer the question, regardless of how retrieval is configured. A procedure that says “restart the service” without naming which service. A troubleshooting note that says “this usually fixes it” without saying what “it” refers to. A policy document that mentions “the standard form” without specifying which one.
The retrieval system finds these documents. They score well on similarity. The answer is still wrong because the document was incomplete when it was stored. No embedding model changes that. The missing information was never in the index.
What storage-time quality means in practice
Two techniques illustrate what it looks like to fix quality at the storage layer rather than the retrieval layer.
Contextual retrieval (a technique published by Anthropic in 2024) works like this: before a chunk of text is stored, a brief context sentence is prepended to it, explaining where this piece fits in the larger document. A chunk stored as “the return period is 30 days” becomes “In the refund policy section: the return period is 30 days.” The embedding now captures the chunk in its context. When a user asks about returns, the match is stronger and more accurate. Anthropic reported roughly a 49 percent reduction in retrieval failures on their benchmark after applying this technique.
HyPE (Hypothetical Prompt Embeddings) works differently. Instead of enriching the chunk text itself, the system generates a set of questions that this chunk would answer, and stores those questions alongside the chunk. When a user asks something, the system can now match against the pre-generated questions rather than just the raw document text. A chunk about configuring webhooks will now match questions phrased in ways that look nothing like how the documentation was written, because someone already asked those questions during ingest and stored the answers to them.
Both techniques run at ingest time. The retrieval system itself does not change.
The right place for compute
Storage-time enrichment costs compute once, when content goes in. The benefit applies to every query that touches that content from that point forward.
Query-time fixes cost compute on every request. A reranker reruns on every search. Query expansion generates alternatives on every call. Multiple retrieval passes multiply the cost with every question asked. And they still cannot add information that was never stored.
The trade-off is not subtle: invest once at ingest, benefit on every query. Or invest on every query and still not fix the underlying problem.
Where Klai is
Contextual retrieval and HyPE are both in the Klai pipeline architecture. They run at ingest time, before content is ever indexed.
This is not a performance optimisation. It is a quality decision that happens before retrieval is involved at all. The retrieval layer works better because the stored content is better, not because the search algorithm changed.
The specific storage-time choices (chunk size, context strategy, which questions to generate) depend on the content type. A PDF and a meeting transcript are processed differently, which is why not every document should be treated the same at ingest.
Next up in this series: the gap between what a user asks and what the knowledge base was written to answer, and what it takes to close that gap structurally rather than query by query. Read why knowledge base search fails users.