How to find and prioritize knowledge base gaps

Part of: Modelling knowledge

Every question your knowledge base cannot answer well is a data point. The user did not find what they needed. The retrieval system returned something, or returned nothing, and the session ended without a resolution.

Most systems discard that signal. The question disappears into logs, the gap stays open, and the next user who asks the same thing hits the same wall.

Gap detection is the process of turning those failed searches into a prioritised list of content to write. It does not require manual review of every query. It requires knowing what a low-confidence answer looks like, and what to do with it.

Two kinds of failure

Not all failed retrievals are the same, and the difference matters for how you respond.

A hard gap is silence: the knowledge base returned nothing. No article, no chunk, no partial match. The topic is simply absent. The right response is to write something.

A soft gap is low confidence: the system found results, but their similarity scores were too low to trust. Something exists in the knowledge base that loosely relates to the question, but it does not actually answer it. The right response is usually to improve or expand what is already there, not to write from scratch.

Both types matter. Organisations that only track hard gaps miss the large middle ground where content exists but underperforms: articles that are too narrow, too technical, or written for the wrong vocabulary. Soft gaps often signal a mismatch between how users describe problems and how your knowledge base describes solutions.

What the signal looks like

A retrieval system returns a score alongside each result. That score reflects how closely the retrieved chunk matches the query. When scores are consistently low across all returned results, something is wrong: either the content is missing, or the match between how the question is phrased and how the content is written is poor.

The threshold matters and takes calibration. A score that indicates low confidence in one system may be normal in another, depending on the embedding model, the chunk size, and the query length. What you are looking for is not an absolute number but a pattern: queries that consistently fail to surface high-confidence results are the gaps worth prioritising.

Frequency compounds the signal. A single low-confidence query might be an unusual phrasing or a one-off topic. The same question appearing from ten different users over two weeks is a structural gap in the knowledge base.

Turning gaps into a content queue

Gap detection only produces value when it connects to editorial action. A log of unanswered questions that no one reads is not a system; it is a graveyard.

The useful version surfaces gaps to the people who can fix them, in a form that makes the next step obvious. That means:

  • Grouping similar queries so the editor sees “twelve users asked variants of this question” rather than twelve individual log lines
  • Distinguishing hard gaps (no content) from soft gaps (weak content) so the response is targeted
  • Linking directly from the gap to the relevant knowledge base, so the editor can jump straight to writing

The last point matters more than it sounds. The friction between “here is a gap” and “here is where you fix it” is where gap detection programmes fail in practice. If acting on a gap requires navigating to a different system, finding the right knowledge base, and then creating a new article from scratch, most gaps will stay open.

What this does not solve

Gap detection tells you where the knowledge base is failing the questions being asked today. It does not tell you what questions are not being asked because users have given up on finding answers in the knowledge base at all.

It also does not prioritise by impact. A gap that affects one user asking an unusual question and a gap that affects the most common onboarding question both appear in the same detection log. Frequency is a proxy for priority, but it is an imperfect one: a single unanswered question that blocks a user from completing a critical task matters more than fifty low-stakes queries that users easily resolved another way.

Gap detection is an input to editorial judgement, not a replacement for it.

Where Klai is

Klai classifies every chat query that comes in: hard gap if the knowledge base returned nothing, soft gap if the results were too uncertain to trust. Both types surface in the /app/gaps dashboard, grouped by query and filterable by type and period.

For knowledge bases that live inside Klai, each gap links to the knowledge base where the fix belongs. For external sources, you still need to go there yourself. When a page is saved or a connector sync completes, Klai re-scores the open gaps against the updated index. Gaps that now clear the confidence threshold are marked as resolved automatically.

This is the detection layer behind the self-improving knowledge base: the loop only works if something is measuring where retrieval falls short.


Next up in this series: what happens after a gap is closed. How the knowledge base validates that the new content actually answers the question, and what a confidence improvement looks like over time. Read how to know if your knowledge base fix actually worked.