Why knowledge base search fails users

Part of: Modelling knowledge

You wrote the knowledge base in the language of the people who built the product. Your users ask questions in the language of their problem.

These are not the same language. A support article titled “Webhook delivery configuration” answers the question “why aren’t my notifications coming through,” but the words do not overlap. A dense retrieval system (one that matches on meaning, not just keywords) helps. It does not close the gap entirely.

The mismatch is structural. Users describe symptoms. Knowledge bases describe solutions. Closing that gap requires work done before the question is ever asked.

Why retrieval systems can’t fix the vocabulary gap alone

The instinct when retrieval misses is to adjust the retrieval. Try a different embedding model. Lower the similarity threshold. Add keyword search alongside the semantic search.

These adjustments help when the right article exists and the retrieval system failed to surface it. They do not help when the right article was written for a different vocabulary entirely.

The article about “webhook delivery configuration” is correctly indexed. When a user types “notifications not coming through,” the semantic search has to bridge a gap between two entirely different conceptual frames: the builder’s mental model (webhook, delivery, endpoint) and the user’s mental model (notification, not working, broken). Dense retrieval is better at this than keyword search, but it is not perfect, and some gaps are wide enough that no similarity score bridges them.

The problem was set at the moment the article was written. The retrieval system is not where it gets fixed.

Closing the gap at storage time: HyPE and contextual retrieval

The previous post covered contextual retrieval: prepending context to chunks so that embeddings capture each piece in its setting. A different technique targets the vocabulary gap specifically.

HyPE (Hypothetical Prompt Embeddings) inverts the problem. Instead of storing only the article and hoping users phrase their questions in terms that match, the system generates a set of questions the article answers, and stores those questions alongside the content. “Why aren’t my notifications coming through?” is now in the index as a pre-generated question linked to the webhook configuration article.

When a user asks that question, the system matches against the hypothetical questions, not the raw article text. The gap has been bridged at storage time, not at search time.

The cost is compute at ingest: generating questions for every chunk. The benefit applies to every query that touches that content. This is the same trade-off as contextual retrieval: spend once going in, benefit every time.

Closing the gap at query time: hybrid search and query expansion

Two query-time techniques reduce the gap without requiring changes to stored content.

Query expansion generates alternative phrasings of the user’s question before searching. “Notifications not coming through” becomes three or four variant queries, including more technical formulations. The system runs all of them and combines the results. This is more expensive per query than a standard search, and it adds latency.

Hybrid search combines keyword matching with semantic matching. The classic approach is BM25, which scores documents based on exact word overlap. A newer approach uses a neural model that learns which terms matter and how much weight they carry, producing sparse vectors instead of raw term counts. Either way, the principle is the same: a user who knows the technical term gets strong keyword signal, a user who describes the symptom gets semantic signal, and together they cover more of the vocabulary space than either does alone.

None of these techniques eliminate the gap. They reduce it. The articles in a knowledge base written by engineers for engineers will remain harder to retrieve with user-phrased questions than articles written with the user’s vocabulary in mind.

Where Klai is

Hybrid search is the baseline in Klai’s retrieval pipeline: learned sparse vectors and dense search combined, with results fused using Reciprocal Rank Fusion (a method that merges ranked lists from multiple search systems into a single ranked result). The sparse model is multilingual and context-aware, so it handles vocabulary mismatches better than classical keyword matching would. HyPE runs at ingest time as part of the enrichment pipeline. After fusion, a reranker scores each (query, chunk) pair with a cross-attention model for a final precision pass.

Query expansion is in the architecture but not yet in production. The latency trade-off is the constraint: for synchronous chat responses, adding two or three extra retrieval passes before generating an answer adds up.


This is the last post in the “Modelling knowledge” series. The next series, “Retrieval that works”, digs into the engineering side: how gaps are detected, how retrieval decides when to skip, and how multiple search signals combine into one answer. Start with how to find and prioritize knowledge base gaps.