AI/TLDR

What Can Fine-Tuning Actually Change? Style, Format, and Knowledge

Learn which behaviors fine-tuning reliably changes (style, format, tone) and why injecting new factual knowledge usually disappoints.

BEGINNER10 MIN READUPDATED 2026-06-12

In plain English

Fine-tuning a language model means continuing its training on a smaller, curated dataset — typically a few hundred to a few thousand examples — so that the model's behavior shifts toward what you want. But here is the thing that surprises most people: *fine-tuning is extremely good at changing how* a model responds, and surprisingly bad at teaching it what to know.**

Think of it like this. Imagine you hire a journalist who already has a deep general knowledge of the world. You want them to write in your publication's house style — punchy leads, short paragraphs, no passive voice, always end with a call to action. You don't hand them a style guide and ask them to memorize it before every article. Instead you show them 500 past articles from your best writers. After a week of immersion, the journalist starts producing copy that sounds like you without being told. That's fine-tuning working correctly: it changes habits and patterns.

Now imagine you also want that journalist to report accurately on a brand-new proprietary product your company just built — something that didn't exist when they were educated. No amount of reading your old articles will teach them the actual specs of the new product. For that they need a reference document in front of them while they write. That document-in-hand is analogous to retrieval-augmented generation (RAG). Fine-tuning cannot replace it.

Why it matters for builders

Misunderstanding what fine-tuning can change is one of the most expensive mistakes teams make when adopting LLMs. It usually plays out the same way: a team spends weeks collecting internal docs, trains a model on them, and then discovers the model hallucinates facts from those docs just as confidently as ever. The training run cost money. The delay cost more. And the confusion — "why didn't it just learn our facts?" — persists.

Knowing where fine-tuning's boundary lies helps you make three decisions faster:

  • When to fine-tune vs. RAG. If the goal is accurate recall of specific facts, dates, names, or figures, RAG is the right tool and fine-tuning will disappoint. If the goal is consistent tone, format, or a specialized skill, fine-tuning earns its cost.
  • What training data to collect. Style-and-format fine-tunes need input/output example pairs that demonstrate the behavior. Knowledge-injection attempts need a fundamentally different architecture (RAG, tool use, or periodic retraining at scale).
  • How to evaluate success. Style changes are tested by human preference or automated rubrics that grade tone and structure. Knowledge changes must be tested with factual recall benchmarks — and fine-tuning usually fails those benchmarks in ways that matter.

How fine-tuning changes the weights

A language model is a huge collection of floating-point numbers called weights — billions of them. Pretraining set those numbers by optimizing over a massive corpus. Fine-tuning runs the same gradient-descent loop, but starts from the finished pretrained weights, uses a much smaller dataset, and applies a much lower learning rate — typically 0.01% to 0.1% of the change magnitude used in pretraining. The result: weights shift a little in the directions that reduce prediction error on your examples.

Research examining which parts of the network actually change reveals an important asymmetry. Feed-forward layers — which function as associative memories that store factual associations — absorb most of the behavioral change. Attention layers adjust how the model attends to context. A fine-tuning run on style examples nudges both, but the changes are small enough that the model retains the vast majority of its pretrained associations. A fine-tuning run on new factual content tries to overwrite those associations, which is why it tends to produce confident hallucination rather than reliable recall: the pretrained associations resist being overwritten by a tiny dataset.

There is also the catastrophic forgetting risk. Because the same weights encode both old knowledge and new behavior, aggressive fine-tuning on a narrow dataset can degrade the model's general capabilities while improving the targeted behavior. Techniques like LoRA (low-rank adaptation) mitigate this by restricting weight changes to a small low-rank subspace — the pretrained weights remain frozen, and only a tiny set of adapter parameters are updated.

What fine-tuning does well: style, format, and skill

The strongest use cases for fine-tuning all share the same shape: there is a pattern that repeats across thousands of examples, and you want the model to internalize that pattern instead of being told it fresh every time.

Style and tone

A model trained on 500 examples of your brand's copywriting will reliably adopt your brand voice — without a 500-word style guide prepended to every prompt. A 2025 research paper studying tone-of-voice alignment found that fine-tuning on simulated style examples outperformed system-prompt-based approaches for maintaining consistent tone across diverse inputs. The behavior becomes implicit in the weights rather than explicit in the context window.

Output format

Enforcing structured output — valid JSON, a specific XML schema, a table with fixed columns — is where fine-tuning often delivers immediate, measurable ROI. The model stops producing the format "most of the time" and produces it reliably. This is especially valuable in pipelines where downstream code parses the model output and breaks when the format drifts.

jsonjson
// Before fine-tuning: model sometimes returns prose, sometimes JSON
"The ticket is about billing. Category: billing, Priority: high."

// After fine-tuning on your schema:
{"category": "billing", "priority": "high", "summary": "Charge appears twice"}

Narrow repetitive tasks

Classifying support tickets, extracting fields from invoices, translating internal jargon, scoring essays on a rubric — tasks where you have abundant labeled examples and a well-defined right answer. These tasks are exactly what supervised fine-tuning was designed for.

Why fine-tuning disappoints for knowledge injection

When builders try to use fine-tuning to "teach" a model facts — feeding it product documentation, internal wikis, a new API's spec — the results are almost always worse than expected. Understanding why helps you avoid the trap.

The model learns to imitate, not to remember

Supervised fine-tuning trains the model to predict the next token in your examples. If your training examples look like Q&A pairs about your product, the model learns the pattern of those exchanges — the vocabulary, the phrasing, the response shape. But the specific numerical values, version numbers, and URLs buried in the answers require precise memorization of long sequences. That requires many repetitions of each specific fact across your training data, which you rarely have. The result: the model produces answers that sound like the right format but confabulate the specific details.

Weak encoding fights back

Research on factual knowledge extraction from fine-tuned models (published 2024) found that fine-tuning on QA pairs about well-encoded pretraining facts improves recall, but fine-tuning on poorly-encoded or entirely new facts actively harms downstream factuality. The model becomes more confident without becoming more accurate — the worst possible outcome for a production system.

Rapidly changing data is a losing battle

Fine-tuning is a one-time bake. The moment your documentation updates — a new product version, a changed pricing tier, a deprecated API endpoint — the fine-tuned weights are stale. Re-running the fine-tune is slow and expensive. RAG, by contrast, updates the knowledge at query time: change the document in your vector store, and the next response reflects it instantly.

GoalFine-tuningRAG
Consistent output formatExcellentNot applicable
Brand tone and voiceExcellentLimited via prompting only
Proprietary factual recallPoor — hallucinatesExcellent
Up-to-date informationFails immediately on stale dataExcellent — update the index
Narrow domain jargonGoodGood (depends on retrieval quality)
Speed to update knowledgeHours to days for re-trainingSeconds to minutes for index update

Going deeper

For engineers who want to push past the basics, here are the concepts that explain why the behavior/knowledge asymmetry is so persistent — and the active research addressing it.

The low intrinsic dimensionality of fine-tuning updates

One of the most useful insights from the LoRA paper (Hu et al., 2021, still the dominant framework in 2025) is that the weight updates produced by fine-tuning have low intrinsic dimensionality — they lie in a subspace of the full parameter space that is much smaller than the model's total parameter count. This is why LoRA can achieve comparable results by only training low-rank adapter matrices. It also explains why fine-tuning can't easily force the model to learn arbitrary new facts: there simply isn't enough gradient signal from a small dataset to push the weights across the energy barriers that would make new facts stick.

Self-teaching and knowledge-aware fine-tuning (2024-2025 research)

Active research is working on the knowledge injection problem. Self-Tuning (2024, arXiv:2406.06326) uses a model to generate its own teaching signal, evaluating what it already knows before deciding what to fine-tune on. Open-book fine-tuning incorporates reference documents directly into the training context alongside the Q&A pair, so the model learns to use the reference rather than memorize the answer — a technique that reduces hallucination while still achieving domain specialization. These methods are promising but not yet the default practice.

Catastrophic forgetting and how to fight it

A January 2025 paper (arXiv:2501.13669) showed that layer-wise and element-wise regularization during fine-tuning significantly reduces catastrophic forgetting — the degradation of general capabilities when you fine-tune hard on a narrow task. Sharpness-aware minimization (SAM) has also been shown to preserve general capabilities better than standard Adam-based fine-tuning, especially in 7B and 13B parameter models. For most practitioners the practical takeaway is simpler: use LoRA/QLoRA and keep your epoch count low (1–3 epochs) to prevent the model from drifting too far from its pretrained prior.

The practical decision tree

When you are staring at a use case and deciding whether to fine-tune, ask these questions in order:

  1. Does a better prompt already work? If yes, stop. Prompting is free and instant.
  2. Are you adding facts or changing behavior? Facts → RAG. Behavior → fine-tuning.
  3. How often does the data change? Frequently-changing facts rule out fine-tuning even if they were learnable.
  4. Do you have 100+ high-quality labeled examples? Fine-tuning under 50 examples rarely generalizes. Under 20, you're just overfitting.
  5. Can you measure success? Define an eval set before training. If you can't define pass/fail criteria, you can't trust the output of the run.

FAQ

Can I fine-tune a model to memorize my company's internal documentation?

You can try, but the results will disappoint. Fine-tuning on documentation tends to produce a model that has absorbed the style of your docs but confabulates the specific facts — version numbers, prices, URLs, and proper nouns. For reliable recall of internal knowledge, RAG (retrieval-augmented generation) is the right tool: the docs stay in an external index and are fetched at query time, so they're always current and never hallucinated.

How many examples do I need for fine-tuning to actually change behavior?

For style and format tasks, 100–500 high-quality input/output pairs typically produce a noticeable and consistent change. Below 50 examples you risk overfitting rather than generalizing. Factual tasks are different — reliable factual recall through fine-tuning generally requires hundreds of varied phrasings of each individual fact, which is impractical for most real-world knowledge bases.

What's the risk of fine-tuning making a model worse?

Two main risks. First, catastrophic forgetting: aggressive fine-tuning on a narrow dataset can degrade the model's general capabilities — reasoning, language quality, helpfulness on out-of-domain questions. Second, confident hallucination: fine-tuning on new facts the model didn't learn deeply can make it more confident while being less accurate. Using LoRA/QLoRA with a low learning rate and 1–3 epochs significantly reduces both risks.

If I fine-tune on style, will the model lose any of its existing knowledge?

With standard LoRA-based fine-tuning on a reasonable dataset (a few hundred to a few thousand examples, 1–3 epochs), the model retains the vast majority of its pretrained knowledge. The weight changes are small and concentrated. Full fine-tuning with many epochs or a high learning rate carries more risk of overwriting pretrained representations — another reason LoRA is the default approach.

Does fine-tuning work better for knowledge that was already in the pretraining data?

Yes, and this is an important nuance. A 2024 study found that fine-tuning on QA pairs about well-encoded pretraining facts reliably improves their recall. If the model already "knows" a fact but buries it under generic responses, fine-tuning can surface it. But for genuinely new facts — things the model has never seen — fine-tuning typically introduces hallucination rather than reliable recall.

Can I use both fine-tuning and RAG together?

Yes, and this is often the best architecture for production systems. Fine-tuning shapes the model's behavior — response style, output format, domain vocabulary, structured output schemas. RAG supplies the factual content — your product docs, policies, or any knowledge that changes over time. The two techniques address complementary limitations and compound each other's strengths.

Further reading