How to know if your knowledge base fix actually worked

19 March 2026 Mark

You added the article. The gap is still showing.

This is the moment where most knowledge management systems let you down. Gap detection told you something was wrong. You fixed it. But the system has no way to connect what you wrote to the question that triggered the gap.

Detection without validation is half a system.

How other systems handle this

The established approach is manual verification. Guru, which explicitly models knowledge quality, routes new and updated content through a subject matter expert who confirms it is correct and complete. The gap is considered closed when a human signs off. That works really well, but there is a but. It requires someone to connect the incoming question to the new content, and that connection is not automatic.

Most knowledge base tools do not address this at all. They do not have retrieval scores. They can tell you that a user searched and did not find something. They cannot tell you why retrieval failed, which means they have no basis for testing whether the failure has been corrected.

Academic work on closed-loop RAG evaluation, measuring whether knowledge additions actually improve retrieval outcomes, appeared in research literature in 2025. It is a recognized problem without an established off-the-shelf solution.

What re-scoring requires

Automatic gap validation is only possible when the system has what it needs to rerun the original test: the query, the scores that triggered the gap, and the same threshold logic.

The gap was flagged because retrieval confidence fell below a threshold. Validating the fix means running the same query again and checking whether the score now clears that threshold. If it does, the gap is resolved.

This is only possible when gap detection is score-based. If a system knows only that a user searched and did not click a result, there is no score to compare against. The re-scoring test has nothing to run. This is why automatic validation is not available in most knowledge base tools: the detection mechanism does not produce the signal needed for the validation.

Hard gaps and soft gaps at closure

The two gap types introduced in the previous post resolve differently.

A hard gap (no content returned) resolves when the same query now returns at least one chunk above the confidence threshold. Content that was absent from the index is now present.

A soft gap (low-confidence result) resolves when the top-scoring chunk now clears the same threshold that originally flagged it. An article that was too narrow, too technical, or written in the wrong vocabulary has been improved enough for the match to land.

Timing matters

Re-scoring has to happen after the new content is actually indexed. Writing a page and indexing it in the vector store are not the same operation.

If the re-scoring job runs before the index has updated, the new content is invisible to retrieval, the score does not change, and the gap stays open. It will resolve on the next trigger. In practice, a short delay between the page save event and the re-scoring job is enough. Most content pipelines complete indexing within a few seconds.

What the count going down means

The signal gap validation produces is not just “this gap is closed.” It is: the content you added actually answered the question that triggered the gap.

A gap dashboard where the count decreases over time is evidence that the editorial process is working. A dashboard where the count only accumulates means fixes are not landing, or new gaps are appearing faster than old ones are being closed. The ratio matters more than the absolute number.

Where Klai is

When a page is saved in a Klai knowledge base, or when a connector sync completes successfully, Klai re-scores the most recent open gaps for that org against the updated index. Gaps that now clear the confidence threshold are marked as resolved and removed from the default dashboard view.

The re-scoring uses the same threshold logic that originally flagged the gap. A soft gap requires the same score to resolve as it did to trigger.

The dashboard count going down is intentional. It is not just that time has passed. It means your edits worked.

Next up in this series: why not every message should trigger a knowledge base search in the first place, and how a two-layer gate saves latency and avoids irrelevant answers. Read not every question needs the knowledge base.