How to Show Citations and Sources in an AI Answer

Make AI answers verifiable with citation UI that links every claim back to its source.

INTERMEDIATE16 MIN READUPDATED 2026-06-12

In plain English

When an AI gives you an answer, there's a question underneath every sentence: is this real? Citations are the mechanism that turns that question into something the user can actually check. Instead of trusting the model on faith, the user can click a small superscript [1] and read the source document for themselves.

Show Citations and Sources in an AI Answer — diagram — Show Citations and Sources in an AI Answer — uwcchina.libguides.com

Think of it like a Wikipedia article. The prose reads smoothly, but every significant claim has a bracketed number linking to a reference at the bottom of the page. If you doubt a sentence, you follow the link. The numbered footnote doesn't interrupt your reading — it just makes verification possible. Citation UI for AI answers works the same way: the primary experience is the answer, but the source scaffolding is always one click away.

This article covers the full stack: how to pass source metadata through a retrieval-augmented generation (RAG) pipeline without losing it, how to prompt the model to output citation markers without hallucinating, and how to render those markers as inline superscripts, hover previews, and source cards that users can actually trust.

Why citations matter in AI products

LLMs hallucinate. Not always, not obviously, but reliably enough that unverified AI answers create real liability in any domain where accuracy matters: legal, medical, financial, technical documentation, customer support. A 2024 Nature study found that large language models fabricate citations in roughly 36% of generated references when asked to produce them from memory. That number drops dramatically when the model is grounded on retrieved documents and asked only to reference them — but only if the UI makes those references visible and clickable.

From a UX perspective, citations change the nature of trust. Without them, the user is trusting the model unconditionally. With them, trust becomes verifiable: the user can spot-check any claim in seconds. Products with citations generate significantly fewer hallucination complaints, not just because the model is more careful, but because users catch and report errors that would otherwise go unnoticed — closing the feedback loop.

Citations also differentiate a product. Perplexity popularised the pattern of search-as-synthesis with inline numbered sources, and users came to expect it from any AI tool that draws on external information. An AI assistant that gives answers with no indication of where the information came from now feels less trustworthy by comparison — even when its accuracy is identical.

Without citations	With citations
User must trust the model blindly	User can verify any claim in one click
Hallucinations go undetected until downstream harm	Hallucinations are surfaced and reported quickly
No feedback loop on accuracy	Users flag wrong citations; you can trace failures
Legally risky in regulated domains	Traceability supports audit and compliance requirements
Feels like autocomplete at scale	Feels like a well-sourced analyst

How citation pipelines work

Implementing citations is a pipeline problem, not just a UI problem. The citation has to be born in the retrieval step, survive the prompt construction, be output by the model in a parseable form, and finally be rendered in the UI with the right interaction. If any step drops the metadata, the citation breaks.

// Citation data flow through a RAG pipeline

Retrieve chunksvector search returns chunks + metadataAssign source IDsnumber each chunk [1], [2], [3]...Build promptinject chunks with IDs into system messageModel generates replyoutputs [1], [2] markers in proseParse & resolvemap marker numbers back to metadataRender UIsuperscripts, hover cards, source panel

Step 1: Preserve metadata through retrieval

Every vector database — Pinecone, Weaviate, Chroma, pgvector — can store metadata alongside the embedding vector. At indexing time, attach at minimum: source_url, title, document_id, and chunk_index to each chunk. Most pipelines store this but then discard it by the time the response is assembled. The fix is simple: when you call .query() or .similarity_search(), capture the full result objects, not just the page content strings.

retrieve_with_metadata.pypython

# LangChain example — capture full Document objects, not just text
from langchain_community.vectorstores import Chroma

vectorstore = Chroma(persist_directory="./db", embedding_function=embedder)

def retrieve_sources(query: str, k: int = 5):
    """Return chunks with their metadata intact."""
    docs = vectorstore.similarity_search(query, k=k)
    sources = []
    for i, doc in enumerate(docs, start=1):
        sources.append({
            "id": i,                           # citation number shown to model
            "content": doc.page_content,
            "title": doc.metadata.get("title", "Untitled"),
            "url": doc.metadata.get("source_url", ""),
            "chunk_index": doc.metadata.get("chunk_index", 0),
        })
    return sources

Step 2: Inject numbered sources into the system prompt

Format the retrieved chunks as a numbered list in the system message. The model sees each chunk prefixed with its ID — [1], [2] etc. — and is instructed to reference those IDs when it uses information from that chunk. The model does not need to know the URL; it only needs to output the number. Your rendering layer holds the URL and maps the number to it.

build_prompt.pypython

def build_system_prompt(sources: list[dict]) -> str:
    source_block = "\n".join(
        f"[{s['id']}] {s['title']}\n{s['content']}"
        for s in sources
    )
    return f"""You are a helpful assistant. Answer using only the sources below.
After each claim, cite the source number in square brackets, e.g. [1] or [2][3].
Do not invent URLs. Only reference source numbers that appear in this list.

--- SOURCES ---
{source_block}
--- END SOURCES ---"""

Step 3: Parse citation markers from the model output

After the model responds, parse [1], [2] patterns from the answer text and replace them with rich citation objects for your UI. A simple regex captures all markers; you then look each number up in the sources array you built in Step 1.

parse-citations.tstypescript

interface Source {
  id: number;
  title: string;
  url: string;
}

interface CitationSegment {
  type: "text" | "citation";
  content: string;       // raw text for 'text' segments
  source?: Source;       // resolved source for 'citation' segments
}

function parseAnswer(
  rawAnswer: string,
  sources: Source[]
): CitationSegment[] {
  // Split on [1], [2], ... markers
  const parts = rawAnswer.split(/(\[\d+\])/g);
  return parts.map((part) => {
    const match = part.match(/^\[(\d+)\]$/);
    if (match) {
      const id = parseInt(match[1], 10);
      const source = sources.find((s) => s.id === id);
      return { type: "citation", content: part, source };
    }
    return { type: "text", content: part };
  });
}

Citation UI patterns

Once the parsed segments are in hand, you have several rendering options. The best choice depends on your content density and how much screen real estate you have. All three patterns below are in production use at major AI products as of 2025–2026.

Pattern A: Inline superscript footnotes

Replace each [1] marker with a small superscript — ¹ — that is an anchor link jumping to a numbered source list below the answer. This is the academic citation model, the same pattern Wikipedia uses. It produces the cleanest reading experience: the answer prose flows uninterrupted and the sources sit neatly below. Hover previews (a tooltip showing the source title and a snippet) are a natural enhancement that avoids requiring the user to scroll down.

CitationMark.tsxtypescript

import { useState } from "react";

interface Source { id: number; title: string; url: string; snippet?: string; }

function CitationMark({ source }: { source: Source }) {
  const [hovered, setHovered] = useState(false);

  return (
    <span className="citation-mark-wrapper">
      <a
        href={`#source-${source.id}`}
        className="citation-superscript"
        aria-label={`Source ${source.id}: ${source.title}`}
        onMouseEnter={() => setHovered(true)}
        onMouseLeave={() => setHovered(false)}
      >
        {source.id}
      </a>
      {hovered && source.snippet && (
        <div className="citation-hover-card" role="tooltip">
          <strong>{source.title}</strong>
          <p>{source.snippet}</p>
          <a href={source.url} target="_blank" rel="noopener noreferrer">
            Open source
          </a>
        </div>
      )}
    </span>
  );
}

Pattern B: Source cards below the answer

Render a horizontal row of source cards beneath the answer. Each card shows a favicon, the page title, the domain name, and optionally a one-sentence excerpt. This is the pattern used by Perplexity and Google AI Overviews. It is easiest to implement (no inline parsing needed — just render the sources list you already have) and works well when the answer is short and the sources are highly scannable.

SourceCards.tsxtypescript

interface Source { id: number; title: string; url: string; snippet?: string; }

function SourceCards({ sources }: { sources: Source[] }) {
  if (!sources.length) return null;
  return (
    <div className="source-cards" aria-label="Sources">
      <span className="source-cards-label">Sources</span>
      <div className="source-cards-grid">
        {sources.map((s) => {
          const domain = new URL(s.url).hostname.replace(/^www\./, "");
          return (
            <a
              key={s.id}
              href={s.url}
              target="_blank"
              rel="noopener noreferrer"
              className="source-card"
              id={`source-${s.id}`}
            >
              <img
                src={`https://www.google.com/s2/favicons?domain=${domain}&sz=16`}
                alt=""
                className="source-card-favicon"
              />
              <div className="source-card-body">
                <span className="source-card-title">{s.title}</span>
                <span className="source-card-domain">{domain}</span>
              </div>
            </a>
          );
        })}
      </div>
    </div>
  );
}

Pattern C: Hover preview tooltip

The hover preview sits between superscript and full card: a tooltip that appears when the user mouses over an inline citation marker. It shows the source title, a short excerpt from the relevant chunk, and a link to open the full source. This is the pattern Copilot uses (called a "citation chip") and it works particularly well on desktop where hovering is natural. On mobile, tap-to-expand is the equivalent interaction.

UI pattern	Best for	Implementation effort	Used by
Inline superscript + source list	Long answers, document Q&A, research tools	Medium — requires inline parsing + anchor scroll	Wikipedia model; many enterprise RAG tools
Source cards below answer	Short answers, search-style queries, mobile	Low — render sources[] array directly	Perplexity, Google AI Overviews
Hover preview tooltip	Dense interfaces, desktop-first products	Medium-high — tooltip positioning, mobile fallback	Microsoft Copilot, Granola, Glean

Common pitfalls and how to avoid them

Pitfall 1: The model invents citation numbers

If you send the model five sources numbered [1]–[5] but the model outputs [6] or [7], it is fabricating a citation that does not exist in your context. Guard against this by validating the output: after parsing, drop any citation marker whose ID is not in the sources list, and optionally add a UI label — "Source not found" — so the user knows a claim was unsupported rather than silently losing the reference. Smaller models (7B–13B parameter range) are more prone to this; current frontier models like GPT-5.5 and Claude Opus 4.8 are more reliable at staying within the provided source IDs.

Pitfall 2: Attribution hallucination — right fact, wrong source

This is subtler than inventing a citation entirely. The model cites [2] for a claim that is actually from [3], or cites a source that is tangentially related but does not actually support the specific claim. Research from the University of Amsterdam (2025) distinguishes between correctness (is the answer right?) and faithfulness (does the cited source actually say this?). An answer can be correct but unfaithful — the claim is true, but the cited chunk does not support it. Adding a faithfulness checker (a secondary LLM call that asks "does this source text support this claim?") is the production-grade fix, but expensive. At minimum, keep chunk sizes small: a 512-token chunk is easier to verify than a 2,000-token chunk.

Pitfall 3: Stale or broken source URLs

Source cards that link to 404 pages destroy trust immediately. This is a data freshness problem: documents get moved, deleted, or put behind paywalls after indexing. Mitigations: store a last_verified timestamp on each document and show a warning badge if it is older than your freshness threshold; implement a background job that periodically checks URLs and marks broken ones as unavailable; and display the cached excerpt even when the URL is broken, so the user can see what the source said even if they cannot visit it.

Pitfall 4: Citation overload

A response like "The capital of France is Paris [1][2][3]" is not better than [1] alone. Over-citing clutters the UI, trains users to ignore markers, and often means the prompt is not selecting the most relevant single source. Cap the citation count per claim (one or two markers per sentence is almost always enough) and adjust the system prompt to reinforce that: "Cite the most relevant source number for each claim — do not cite multiple sources unless they say genuinely different things."

Going deeper

Once basic citation display is working, several advanced patterns are worth knowing about — especially for products where source accuracy is a core feature, not just a nice-to-have.

Span-level citations

Standard citations link a claim to a document. Span-level citations go further: they link a specific claim to a specific passage within the document, and highlight that passage when the user clicks. Adobe Acrobat AI does this for PDFs — the citation jumps to the exact page and highlights the sentence. LlamaIndex's CitationQueryEngine approaches this by splitting retrieved chunks into sentence-sized pieces and labeling each one individually, so the citation number maps to a ~1–3 sentence window rather than a full chunk. This dramatically improves verifiability but requires storing bounding-box or character-offset metadata at index time.

Faithfulness checking as a post-processing step

A faithfulness checker is a second LLM call that runs after the main generation: for each (claim, cited_chunk) pair, it asks the model "does this chunk support this claim?" and returns a score or boolean. Products like Ragas and DeepEval provide off-the-shelf faithfulness metrics you can integrate into your evaluation pipeline. Running faithfulness checks on every response in production is expensive; a more practical approach is to run it on a sampled 5–10% of responses and use the results to tune your prompt and chunking strategy.

Streaming citations

When your answer streams token by token, citation markers arrive mid-stream. There are two approaches to rendering them: deferred (buffer the full response, parse it once complete, then display with citations) or progressive (detect [ patterns in the stream buffer and speculatively render a citation placeholder that resolves once the closing ] arrives). Most products use the deferred approach because it is simpler and citation rendering is not the part where latency matters most. Progressive rendering is worth the complexity only if your average response is very long and users need to start clicking sources before the answer finishes.

Citation accessibility

Superscript numbers are invisible to screen readers unless annotated correctly. Wrap each citation mark in an <a> with an aria-label like "Source 1: Title of the source". Use aria-describedby to associate the hover tooltip with the trigger element so screen readers announce the preview content. The react-a11y-footnotes library provides a full accessible footnote implementation (FootnotesProvider + FootnoteRef + Footnotes components) that handles numbering, anchoring, and ARIA attributes automatically — useful if you are building a document-oriented product rather than a chat interface.

LlamaIndex and LangChain built-ins

Both major RAG frameworks offer citation utilities. LlamaIndex's CitationQueryEngine (available in llama-index-core) wraps any existing index: it splits retrieved nodes into citation-sized chunks (default 512 tokens, configurable), injects a citation prompt instructing the model to reference source numbers, and returns both the answer and the source nodes. LangChain's approach uses .withStructuredOutput() with a Zod schema to coerce the model into returning a structured response that includes an array of source IDs alongside the answer text. Both are good starting points; the custom approach in this article gives you more control over metadata shape and UI rendering.

FAQ

Should the model output citation URLs directly, or should my app inject them?

Always inject them from your app. When you ask the model to generate URLs from memory, it produces plausible-looking but often incorrect links — models hallucinate URLs convincingly. Instead, have your retrieval layer return verified source URLs, number each source [1], [2] etc., and ask the model only to output the number. Your rendering layer maps the number back to the real URL stored in your metadata.

What is the difference between a source card and an inline citation?

A source card is a standalone UI element below the answer listing all sources used — title, domain, favicon, optional excerpt. An inline citation is a marker (superscript or bracket) placed directly inside the answer text next to the specific claim it supports. Source cards are easier to implement; inline citations give more precise attribution. Most polished products use both: inline markers in the prose, with source cards below for quick scanning.

How do I handle citations when the answer is streaming?

The simplest approach is deferred rendering: buffer the full streamed response, then parse citation markers once the stream closes and render the final annotated version. This adds a brief moment where the user sees plain-text brackets before they resolve to rich citations, which is generally acceptable. Progressive rendering — resolving markers as they arrive mid-stream — is possible but adds complexity and is only worthwhile for very long responses.

What is attribution hallucination, and how is it different from regular hallucination?

A regular hallucination is when the model states something false. An attribution hallucination is when the model cites a source that does not actually support the claim — the answer may even be correct, but the cited chunk does not say it. This is caught by faithfulness checking: a secondary evaluation that asks whether each cited passage actually entails the claim it is attached to. Tools like Ragas and DeepEval provide faithfulness metrics for this purpose.

How many sources should I retrieve and display per answer?

Retrieve 3–8 chunks for most queries. Display 2–5 source cards — more than 5 starts to feel like information overload and users stop reading them. For inline citations, cap at one or two marker numbers per sentence; citing three or four sources for a single claim suggests your retrieval step is returning redundant chunks that should be deduplicated or your prompt is not guiding the model to select the most relevant single source.

Do I need citations if I'm not building a RAG app?

If your LLM is answering from its own training knowledge rather than retrieved documents, you cannot provide document-level citations. In that case, the honest approach is to surface uncertainty hedges in the model's language ("I believe...", "as of my last training data...") and direct users to authoritative external sources for verification. Citations as a UI pattern only make sense when you have verified, retrievable source documents to link to.

// In plain English

// Why citations matter in AI products

// How citation pipelines work

Step 1: Preserve metadata through retrieval

Step 2: Inject numbered sources into the system prompt

Step 3: Parse citation markers from the model output

// Citation UI patterns

Pattern A: Inline superscript footnotes

Pattern B: Source cards below the answer

Pattern C: Hover preview tooltip

// Common pitfalls and how to avoid them

Pitfall 1: The model invents citation numbers

Pitfall 2: Attribution hallucination — right fact, wrong source

Pitfall 3: Stale or broken source URLs

Pitfall 4: Citation overload

// Going deeper

Span-level citations

Faithfulness checking as a post-processing step

Streaming citations

Citation accessibility

LlamaIndex and LangChain built-ins

// FAQ

// Further reading

// Related

In plain English

Why citations matter in AI products

How citation pipelines work

Citation UI patterns

Common pitfalls and how to avoid them

Going deeper

FAQ

Further reading

Related