OpenAI · 2026-04-29 · major

OpenAI: Where the Goblins Came From — How the 'Nerdy' Persona Made ChatGPT Obsess Over Little Critters

OpenAI explains why ChatGPT had been sliding into goblin and gremlin metaphors since GPT-5.1: the personality-customization team unknowingly handed out high reward to the 'Nerdy' persona for creature-themed similes, and it leaked into every voice.

OpenAI 'Where the goblins came from' post hero illustration — OpenAI

An OpenAI post-mortem on how a single biased reward signal in 'Nerdy' personality training gave ChatGPT a six-month goblin obsession.

Key specs

Nerdy share of chats	2.5%
Nerdy share of goblin mentions	66.7%

What is it?

Published April 29, 2026, 'Where the goblins came from' is OpenAI's transparency post explaining a quirk many users noticed starting with GPT-5.1: ChatGPT kept reaching for goblins, gremlins, and other creature metaphors at an oddly high rate. The post traces the behaviour to the rollout of personality customization — specifically the 'Nerdy' persona, whose training data inadvertently rewarded creature-themed metaphors. The post-mortem ends with OpenAI retiring the Nerdy persona entirely.

How does it work?

OpenAI says model behaviour is shaped by many small reward signals, and during training for the Nerdy personality the team gave 'particularly high rewards' to responses with creature similes. Once that signal landed, the bias generalised: even though Nerdy accounted for only 2.5% of all ChatGPT chats, it produced 66.7% of every goblin mention across the surface. The cross-personality bleed got bad enough that OpenAI had to write a hard override instruction telling the model to stop, before deciding to retire Nerdy outright.

Why does it matter?

It's a rare, narrative-style RLHF post-mortem from a frontier lab — a worked example of how a niche customization can warp the entire model's distribution. For people building on top of ChatGPT, the lesson is concrete: subtle reward biases in one mode can leak into base behaviour, and even big labs need ad-hoc instruction overrides to patch them. It also explains a real product retirement (Nerdy is gone) rather than a hypothetical risk.

Who is it for?

RLHF researchers, prompt engineers, ChatGPT power users