Grounding

What it is

Grounding is the practice of anchoring language model outputs to external, verifiable information sources. Grounded outputs reference specific documents or facts, improving factuality and enabling fact-checking. Retrieval-augmented generation (RAG) is the primary grounding technique: retrieve relevant documents and condition generation on them.

[illustrate: Model generating answer with source citations; answer can be verified by checking cited documents]

How it works

  1. Document retrieval:

    • Search external knowledge base for relevant documents
    • Retrieve top-k documents (dense, sparse, or hybrid retrieval)
  2. Prompt augmentation:

    • Concatenate retrieved documents with query
    • Example: “Based on the following sources: [doc1] [doc2]… Answer: [query]”
  3. Grounded generation:

    • Model generates answer conditioned on documents
    • Factuality constrained by document content
  4. Citation generation:

    • Ideally, model includes source citations: “The answer is X (source_1)”
    • Enables user verification

Example

# Without grounding:
Q: "What was Marie Curie's first major discovery?"
A: "She discovered polonium in 1897" (hallucination: 1898)

# With grounding:
Retrieved: "Marie Curie discovered polonium in December 1898 in Paris."
Q: "What was Marie Curie's first major discovery?"
A: "Marie Curie discovered polonium in 1898 (source: Wikipedia-Marie-Curie)"
(Factuality improved; citation enables verification)

Variants and history

Grounding emerged from question-answering research (early 2000s). Open-domain QA systems retrieve documents then extract answers. Retrieval-augmented generation (Lewis et al., 2020) made grounding standard. Citation generation improves transparency. Attributed generation trains models to generate with citations. Fact-checking and evidence retrieval complement grounding. Modern RAG systems (2023+) add iterative retrieval, multi-hop reasoning.

When to use it

Use grounding when:

  • Factuality is critical (medical, legal, financial advice)
  • Auditability is required (regulated industries)
  • Users need to verify claims
  • Knowledge base is dynamic and needs updating
  • Reducing hallucination is essential

Grounding adds latency (retrieval + generation) but dramatically improves reliability. Trade-off: slightly slower but far more trustworthy.

See also