AI/TLDR

talkie-lm · 2026-04-28 · notable

Talkie-1930 — 13B Open-Weight LLM Trained Only on Pre-1931 English Text

Apache-2.0 13B language model trained on 260B tokens of exclusively pre-1931 English — books, newspapers, journals, patents — to study what a model with a 1930-era worldview learns and where it fails.

talkie-lm/talkie GitHub social card

A 13B LLM with a hard 1930 cutoff: no World War II, no transistors, no internet, trained on OCR'd books and newspapers.

Key specs

LicenseApache-2.0
Parameters13B
Training tokens260B
Knowledge cutoff1930-12-31
Min vram28 GB
Hn points617
Hn comments247

What is it?

Talkie-1930 is a 13B open-weight transformer trained on 260B tokens of English text published on or before 31 December 1930. Two Apache-2.0 checkpoints ship: a base completion model and an instruction-tuned chat model. The team frames it as a research artifact for studying what a frontier-style model trained on a closed historical corpus actually knows and where it confabulates.

How does it work?

Because there was no digital publishing in 1930, the corpus was assembled by OCR'ing physical sources — books, newspapers, periodicals, scientific journals, patents, case law — and then filtered with a document-level n-gram anachronism classifier to discard text contaminated by post-1930 content. The authors note the filter is imperfect and the 13B checkpoint retains some awareness of WWII and the postwar order.

Why does it matter?

It's a clean experimental setup for capability and generalization research: how much of a modern model's reasoning comes from contemporary data, how much from scale, how much from architecture? A 1930-cutoff model is also an unusual artifact for historians, classicists, and writers of period fiction.

Who is it for?

ML researchers studying generalization and data-cutoff effects, computational historians, and authors writing in period voice

Try it

huggingface.co/talkie-lm/talkie-1930-13b-it (needs ≥28 GB CUDA VRAM)

Sources · 5 outlets

Tags

  • model
  • open-weights
  • research
  • historical-corpus
  • ocr
  • apache-2
  • 13b
  • vintage-lm

← All releases · Learn AI