Evidence vs. claims: the distinction that changes everything

23 February 2026 Mark

Not all your sources are equal. A meeting transcript is not the same as a procedure your team has relied on for two years. A raw support ticket is not the same as the distilled answer your best support engineer gives. You already know this.

Most knowledge systems do not.

Why the obvious observation changes nothing

The default response to “not all sources are equal” is to add metadata. Tag things. Label this a procedure, label that a transcript. Build a taxonomy and let the AI use the tags as context when retrieving.

The problem is that taxonomy describes category. It does not describe causation. And the distinction that matters for a knowledge system is not what type of document something is. It is how that document came to exist.

Evidence and claims are different kinds of things

There are exactly two structural types of artifact in a knowledge system, defined by their causal relationship to reality.

Source documents as evidence

A source document records something that happened or existed independently of the knowledge system. A helpdesk transcript. A meeting recording. An imported policy PDF. These are evidence. They are immutable after ingest. The transcript cannot become more or less true over time. It captured what was said, at that moment. That fact does not change.

When a policy is revised, the new version arrives as a new source document alongside the original. The original stays: it is now the historical record of what the policy said before the change. Source documents accumulate. They do not update in place.

Knowledge artifacts as claims

A knowledge artifact is constructed: intentionally assembled to assert what is true. An extracted problem/solution pair. A procedure distilled from five similar support tickets. A decision record with the reasoning attached. These are claims. When a claim changes, the old version is preserved and linked to the new one. Knowledge artifacts evolve.

The operational rule follows directly: the system answers from claims and cites evidence. A raw transcript is never the answer. It is the basis for an answer.

Origin does not determine type

One clarification worth making: the origin does not determine the type. A conclusion saved from an AI-assisted conversation is a knowledge artifact, not because a human wrote it, but because someone made an intentional decision to assert that this is what we believe. An imported PDF is a source document, not because it came from outside, but because it records something that existed before the system encountered it.

What breaks when you conflate them

Treating source documents and knowledge artifacts as the same kind of thing causes three specific failures.

Broken confidence calibration

Confidence calibration stops working. A claim built from three independent transcripts should carry more weight than one based on a single passing remark. A flat index has no way to know the difference. It retrieves by similarity and presents both with equal authority.

Lost temporal reasoning

Temporal reasoning becomes impossible. “What did we know about the VoIP setup procedure before the February product update?” is answerable if claims carry provenance back to the evidence they were built from. It is unanswerable if you have only indexed chunks with no construction history.

Failed invalidation cascades

Invalidation cannot cascade. When a source is retracted (a product version deprecated, an assumption proven wrong), you want to flag every claim derived from it automatically. That requires knowing what was built from what. A flat index has no record of this.

The surface symptom

All three failures produce the same surface symptom: answers that look authoritative but cannot be traced or trusted. The system returns a well-formatted paragraph from a transcript and presents it as organisational knowledge. The reader has no way to tell whether they are reading a distilled belief or a verbatim snippet from one conversation two years ago.

What the two-layer model enables

The concrete version: a helpdesk transcript arrives as evidence, immutable, filed, never served directly. From it, the system extracts a knowledge artifact: the problem, the resolution, a confidence score reflecting how many independent sources confirm the finding. When a user asks the same question, they get the artifact. The transcripts are cited. They are not the answer.

Confidence tracking with provenance

A claim derived from three independent sources carries a confidence of 0.87. A claim from one carries 0.4. The system knows the difference because it tracked the construction.

Historical reasoning and invalidation

The knowledge base can answer “what did we believe about this in Q3?” because claims carry timestamps and provenance. It can flag a claim for review when its source evidence is deprecated. It can update a claim without touching the evidence it was built from.

The cost

The trade-off

Maintaining the distinction requires a construction step. Source documents do not automatically become knowledge artifacts. They have to be processed: by extraction, by human review, or both.

The trade-off is real. If you treat everything as one type, you gain simplicity. Everything goes in, nothing to curate, fast to deploy. What you lose is the ability to distinguish what your organisation believes from what was said in one conversation. You lose confidence calibration, temporal reasoning, and clean invalidation.

When the cost surfaces

That loss does not surface on day one. It appears six months later, when someone makes a decision on an answer that was confidently styled but traced back to a single source that has since been superseded. The record looked right. The distinction was never maintained.

The temporal side of this distinction (evidence that never changes, claims that evolve) is the basis for why you should never delete knowledge. The two concepts are the same idea viewed from different angles.

Next up in this series: the three metadata axes that every knowledge artifact carries, and why collapsing them into a single type label is the most common design failure in knowledge systems. Read three axes, not one label.