In plain English
When an AI gives you an answer, there's a question underneath every sentence: is this real? Citations are the mechanism that turns that question into something the user can actually check. Instead of trusting the model on faith, the user can click a small superscript [1] and read the source document for themselves.
Think of it like a Wikipedia article. The prose reads smoothly, but every significant claim has a bracketed number linking to a reference at the bottom of the page. If you doubt a sentence, you follow the link. The numbered footnote doesn't interrupt your reading — it just makes verification possible. Citation UI for AI answers works the same way: the primary experience is the answer, but the source scaffolding is always one click away.
This article covers the full stack: how to pass source metadata through a retrieval-augmented generation (RAG) pipeline without losing it, how to prompt the model to output citation markers without hallucinating, and how to render those markers as inline superscripts, hover previews, and source cards that users can actually trust.
Why citations matter in AI products
LLMs hallucinate. Not always, not obviously, but reliably enough that unverified AI answers create real liability in any domain where accuracy matters: legal, medical, financial, technical documentation, customer support. A 2024 Nature study found that large language models fabricate citations in roughly 36% of generated references when asked to produce them from memory. That number drops dramatically when the model is grounded on retrieved documents and asked only to reference them — but only if the UI makes those references visible and clickable.
From a UX perspective, citations change the nature of trust. Without them, the user is trusting the model unconditionally. With them, trust becomes verifiable: the user can spot-check any claim in seconds. Products with citations generate significantly fewer hallucination complaints, not just because the model is more careful, but because users catch and report errors that would otherwise go unnoticed — closing the feedback loop.
Citations also differentiate a product. Perplexity popularised the pattern of search-as-synthesis with inline numbered sources, and users came to expect it from any AI tool that draws on external information. An AI assistant that gives answers with no indication of where the information came from now feels less trustworthy by comparison — even when its accuracy is identical.
| Without citations | With citations |
|---|---|
| User must trust the model blindly | User can verify any claim in one click |
| Hallucinations go undetected until downstream harm | Hallucinations are surfaced and reported quickly |
| No feedback loop on accuracy | Users flag wrong citations; you can trace failures |
| Legally risky in regulated domains | Traceability supports audit and compliance requirements |
| Feels like autocomplete at scale | Feels like a well-sourced analyst |
How citation pipelines work
Implementing citations is a pipeline problem, not just a UI problem. The citation has to be born in the retrieval step, survive the prompt construction, be output by the model in a parseable form, and finally be rendered in the UI with the right interaction. If any step drops the metadata, the citation breaks.
Step 1: Preserve metadata through retrieval
Every vector database — Pinecone, Weaviate, Chroma, pgvector — can store metadata alongside the embedding vector. At indexing time, attach at minimum: source_url, title, document_id, and chunk_index to each chunk. Most pipelines store this but then discard it by the time the response is assembled. The fix is simple: when you call .query() or .similarity_search(), capture the full result objects, not just the page content strings.
# LangChain example — capture full Document objects, not just text
from langchain_community.vectorstores import Chroma
vectorstore = Chroma(persist_directory="./db", embedding_function=embedder)
def retrieve_sources(query: str, k: int = 5):
"""Return chunks with their metadata intact."""
docs = vectorstore.similarity_search(query, k=k)
sources = []
for i, doc in enumerate(docs, start=1):
sources.append({
"id": i, # citation number shown to model
"content": doc.page_content,
"title": doc.metadata.get("title", "Untitled"),
"url": doc.metadata.get("source_url", ""),
"chunk_index": doc.metadata.get("chunk_index", 0),
})
return sourcesStep 2: Inject numbered sources into the system prompt
Format the retrieved chunks as a numbered list in the system message. The model sees each chunk prefixed with its ID — [1], [2] etc. — and is instructed to reference those IDs when it uses information from that chunk. The model does not need to know the URL; it only needs to output the number. Your rendering layer holds the URL and maps the number to it.
def build_system_prompt(sources: list[dict]) -> str:
source_block = "\n".join(
f"[{s['id']}] {s['title']}\n{s['content']}"
for s in sources
)
return f"""You are a helpful assistant. Answer using only the sources below.
After each claim, cite the source number in square brackets, e.g. [1] or [2][3].
Do not invent URLs. Only reference source numbers that appear in this list.
--- SOURCES ---
{source_block}
--- END SOURCES ---"""Step 3: Parse citation markers from the model output
After the model responds, parse [1], [2] patterns from the answer text and replace them with rich citation objects for your UI. A simple regex captures all markers; you then look each number up in the sources array you built in Step 1.
interface Source {
id: number;
title: string;
url: string;
}
interface CitationSegment {
type: "text" | "citation";
content: string; // raw text for 'text' segments
source?: Source; // resolved source for 'citation' segments
}
function parseAnswer(
rawAnswer: string,
sources: Source[]
): CitationSegment[] {
// Split on [1], [2], ... markers
const parts = rawAnswer.split(/(\[\d+\])/g);
return parts.map((part) => {
const match = part.match(/^\[(\d+)\]$/);
if (match) {
const id = parseInt(match[1], 10);
const source = sources.find((s) => s.id === id);
return { type: "citation", content: part, source };
}
return { type: "text", content: part };
});
}Citation UI patterns
Once the parsed segments are in hand, you have several rendering options. The best choice depends on your content density and how much screen real estate you have. All three patterns below are in production use at major AI products as of 2025–2026.
Pattern A: Inline superscript footnotes
Replace each [1] marker with a small superscript — ¹ — that is an anchor link jumping to a numbered source list below the answer. This is the academic citation model, the same pattern Wikipedia uses. It produces the cleanest reading experience: the answer prose flows uninterrupted and the sources sit neatly below. Hover previews (a tooltip showing the source title and a snippet) are a natural enhancement that avoids requiring the user to scroll down.
import { useState } from "react";
interface Source { id: number; title: string; url: string; snippet?: string; }
function CitationMark({ source }: { source: Source }) {
const [hovered, setHovered] = useState(false);
return (
<span className="citation-mark-wrapper">
<a
href={`#source-${source.id}`}
className="citation-superscript"
aria-label={`Source ${source.id}: ${source.title}`}
onMouseEnter={() => setHovered(true)}
onMouseLeave={() => setHovered(false)}
>
{source.id}
</a>
{hovered && source.snippet && (
<div className="citation-hover-card" role="tooltip">
<strong>{source.title}</strong>
<p>{source.snippet}</p>
<a href={source.url} target="_blank" rel="noopener noreferrer">
Open source
</a>
</div>
)}
</span>
);
}Pattern B: Source cards below the answer
Render a horizontal row of source cards beneath the answer. Each card shows a favicon, the page title, the domain name, and optionally a one-sentence excerpt. This is the pattern used by Perplexity and Google AI Overviews. It is easiest to implement (no inline parsing needed — just render the sources list you already have) and works well when the answer is short and the sources are highly scannable.
interface Source { id: number; title: string; url: string; snippet?: string; }
function SourceCards({ sources }: { sources: Source[] }) {
if (!sources.length) return null;
return (
<div className="source-cards" aria-label="Sources">
<span className="source-cards-label">Sources</span>
<div className="source-cards-grid">
{sources.map((s) => {
const domain = new URL(s.url).hostname.replace(/^www\./, "");
return (
<a
key={s.id}
href={s.url}
target="_blank"
rel="noopener noreferrer"
className="source-card"
id={`source-${s.id}`}
>
<img
src={`https://www.google.com/s2/favicons?domain=${domain}&sz=16`}
alt=""
className="source-card-favicon"
/>
<div className="source-card-body">
<span className="source-card-title">{s.title}</span>
<span className="source-card-domain">{domain}</span>
</div>
</a>
);
})}
</div>
</div>
);
}Pattern C: Hover preview tooltip
The hover preview sits between superscript and full card: a tooltip that appears when the user mouses over an inline citation marker. It shows the source title, a short excerpt from the relevant chunk, and a link to open the full source. This is the pattern Copilot uses (called a "citation chip") and it works particularly well on desktop where hovering is natural. On mobile, tap-to-expand is the equivalent interaction.
| UI pattern | Best for | Implementation effort | Used by |
|---|---|---|---|
| Inline superscript + source list | Long answers, document Q&A, research tools | Medium — requires inline parsing + anchor scroll | Wikipedia model; many enterprise RAG tools |
| Source cards below answer | Short answers, search-style queries, mobile | Low — render sources[] array directly | Perplexity, Google AI Overviews |
| Hover preview tooltip | Dense interfaces, desktop-first products | Medium-high — tooltip positioning, mobile fallback | Microsoft Copilot, Granola, Glean |
Common pitfalls and how to avoid them
Pitfall 1: The model invents citation numbers
If you send the model five sources numbered [1]–[5] but the model outputs [6] or [7], it is fabricating a citation that does not exist in your context. Guard against this by validating the output: after parsing, drop any citation marker whose ID is not in the sources list, and optionally add a UI label — "Source not found" — so the user knows a claim was unsupported rather than silently losing the reference. Smaller models (7B–13B parameter range) are more prone to this; GPT-4o and Claude 3+ are more reliable at staying within the provided source IDs.
Pitfall 2: Attribution hallucination — right fact, wrong source
This is subtler than inventing a citation entirely. The model cites [2] for a claim that is actually from [3], or cites a source that is tangentially related but does not actually support the specific claim. Research from the University of Amsterdam (2025) distinguishes between correctness (is the answer right?) and faithfulness (does the cited source actually say this?). An answer can be correct but unfaithful — the claim is true, but the cited chunk does not support it. Adding a faithfulness checker (a secondary LLM call that asks "does this source text support this claim?") is the production-grade fix, but expensive. At minimum, keep chunk sizes small: a 512-token chunk is easier to verify than a 2,000-token chunk.
Pitfall 3: Stale or broken source URLs
Source cards that link to 404 pages destroy trust immediately. This is a data freshness problem: documents get moved, deleted, or put behind paywalls after indexing. Mitigations: store a last_verified timestamp on each document and show a warning badge if it is older than your freshness threshold; implement a background job that periodically checks URLs and marks broken ones as unavailable; and display the cached excerpt even when the URL is broken, so the user can see what the source said even if they cannot visit it.
Pitfall 4: Citation overload
A response like "The capital of France is Paris [1][2][3]" is not better than [1] alone. Over-citing clutters the UI, trains users to ignore markers, and often means the prompt is not selecting the most relevant single source. Cap the citation count per claim (one or two markers per sentence is almost always enough) and adjust the system prompt to reinforce that: "Cite the most relevant source number for each claim — do not cite multiple sources unless they say genuinely different things."
Going deeper
Once basic citation display is working, several advanced patterns are worth knowing about — especially for products where source accuracy is a core feature, not just a nice-to-have.
Span-level citations
Standard citations link a claim to a document. Span-level citations go further: they link a specific claim to a specific passage within the document, and highlight that passage when the user clicks. Adobe Acrobat AI does this for PDFs — the citation jumps to the exact page and highlights the sentence. LlamaIndex's CitationQueryEngine approaches this by splitting retrieved chunks into sentence-sized pieces and labeling each one individually, so the citation number maps to a ~1–3 sentence window rather than a full chunk. This dramatically improves verifiability but requires storing bounding-box or character-offset metadata at index time.
Faithfulness checking as a post-processing step
A faithfulness checker is a second LLM call that runs after the main generation: for each (claim, cited_chunk) pair, it asks the model "does this chunk support this claim?" and returns a score or boolean. Products like Ragas and DeepEval provide off-the-shelf faithfulness metrics you can integrate into your evaluation pipeline. Running faithfulness checks on every response in production is expensive; a more practical approach is to run it on a sampled 5–10% of responses and use the results to tune your prompt and chunking strategy.
Streaming citations
When your answer streams token by token, citation markers arrive mid-stream. There are two approaches to rendering them: deferred (buffer the full response, parse it once complete, then display with citations) or progressive (detect [ patterns in the stream buffer and speculatively render a citation placeholder that resolves once the closing ] arrives). Most products use the deferred approach because it is simpler and citation rendering is not the part where latency matters most. Progressive rendering is worth the complexity only if your average response is very long and users need to start clicking sources before the answer finishes.
Citation accessibility
Superscript numbers are invisible to screen readers unless annotated correctly. Wrap each citation mark in an <a> with an aria-label like "Source 1: Title of the source". Use aria-describedby to associate the hover tooltip with the trigger element so screen readers announce the preview content. The react-a11y-footnotes library provides a full accessible footnote implementation (FootnotesProvider + FootnoteRef + Footnotes components) that handles numbering, anchoring, and ARIA attributes automatically — useful if you are building a document-oriented product rather than a chat interface.
LlamaIndex and LangChain built-ins
Both major RAG frameworks offer citation utilities. LlamaIndex's CitationQueryEngine (available in llama-index-core) wraps any existing index: it splits retrieved nodes into citation-sized chunks (default 512 tokens, configurable), injects a citation prompt instructing the model to reference source numbers, and returns both the answer and the source nodes. LangChain's approach uses .withStructuredOutput() with a Zod schema to coerce the model into returning a structured response that includes an array of source IDs alongside the answer text. Both are good starting points; the custom approach in this article gives you more control over metadata shape and UI rendering.
FAQ
Should the model output citation URLs directly, or should my app inject them?
Always inject them from your app. When you ask the model to generate URLs from memory, it produces plausible-looking but often incorrect links — models hallucinate URLs convincingly. Instead, have your retrieval layer return verified source URLs, number each source [1], [2] etc., and ask the model only to output the number. Your rendering layer maps the number back to the real URL stored in your metadata.
What is the difference between a source card and an inline citation?
A source card is a standalone UI element below the answer listing all sources used — title, domain, favicon, optional excerpt. An inline citation is a marker (superscript or bracket) placed directly inside the answer text next to the specific claim it supports. Source cards are easier to implement; inline citations give more precise attribution. Most polished products use both: inline markers in the prose, with source cards below for quick scanning.
How do I handle citations when the answer is streaming?
The simplest approach is deferred rendering: buffer the full streamed response, then parse citation markers once the stream closes and render the final annotated version. This adds a brief moment where the user sees plain-text brackets before they resolve to rich citations, which is generally acceptable. Progressive rendering — resolving markers as they arrive mid-stream — is possible but adds complexity and is only worthwhile for very long responses.
What is attribution hallucination, and how is it different from regular hallucination?
A regular hallucination is when the model states something false. An attribution hallucination is when the model cites a source that does not actually support the claim — the answer may even be correct, but the cited chunk does not say it. This is caught by faithfulness checking: a secondary evaluation that asks whether each cited passage actually entails the claim it is attached to. Tools like Ragas and DeepEval provide faithfulness metrics for this purpose.
How many sources should I retrieve and display per answer?
Retrieve 3–8 chunks for most queries. Display 2–5 source cards — more than 5 starts to feel like information overload and users stop reading them. For inline citations, cap at one or two marker numbers per sentence; citing three or four sources for a single claim suggests your retrieval step is returning redundant chunks that should be deduplicated or your prompt is not guiding the model to select the most relevant single source.
Do I need citations if I'm not building a RAG app?
If your LLM is answering from its own training knowledge rather than retrieved documents, you cannot provide document-level citations. In that case, the honest approach is to surface uncertainty hedges in the model's language ("I believe...", "as of my last training data...") and direct users to authoritative external sources for verification. Citations as a UI pattern only make sense when you have verified, retrievable source documents to link to.