OpenAI publishes the official post
OpenAI traced the creature-word spike to the Nerdy personality reward signal, SFT data reuse, and transfer into non-Nerdy samples. It also confirmed GPT-5.5 began training before the root cause was fixed.
OpenAIGPT Goblins is the AI meme about OpenAI's models developing a strange obsession with goblins, gremlins, raccoons, and other creatures in their metaphors — plus the official investigation, Codex prompt rule, and community hunt for the creatures in the logs.
The story moved from a weird Codex prompt line to an official OpenAI post, then into Reddit, Hacker News, X, and tech media as people compared sightings and tried to trigger the old style.
OpenAI traced the creature-word spike to the Nerdy personality reward signal, SFT data reuse, and transfer into non-Nerdy samples. It also confirmed GPT-5.5 began training before the root cause was fixed.
OpenAIWired, Ars Technica, PC Gamer, PCWorld, Android Authority, Gizmodo, and others covered the repeated Codex instruction and the community reaction to a model being told to avoid creature metaphors.
Ars TechnicaNortheastern coverage highlighted the broader lesson: small reward preferences can compound into visible model behavior, even when the behavior looks silly on the surface.
Northeastern Global NewsHere's the three-step story of how AI models started talking like fantasy creatures.
After GPT-5.1 launched in November 2025, "goblin" usage in ChatGPT rose by 175% and "gremlin" by 52%. At first it seemed like a small lexical quirk.
ChatGPT's "Nerdy" personality accounted for only 2.5% of responses but 66.7% of all goblin mentions. Its reward signal was boosting creature metaphors.
GPT-5.5 started training before the root cause was found. When tested in Codex, employees immediately noticed the goblins. OpenAI added a prompt instruction to suppress them.
The technical root cause: a reinforcement learning reward signal created an unintended feedback loop.
ChatGPT offers personality customization. The "Nerdy" personality used this system prompt:
"You are an unapologetically nerdy, playful and wise AI mentor... You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed..."
OpenAI's analysis found the Nerdy personality reward showed a clear tendency to score outputs containing "goblin" or "gremlin" higher, with positive uplift in 76.2% of datasets. Here's how the loop worked:
Playful style is rewarded
The Nerdy personality reward favors creative, playful language.
Some rewarded outputs contain creature tics
"Little goblin" and "chaos gremlin" get high scores.
The tic appears more often in rollouts
The model learns that creature metaphors = higher rewards.
Rollouts are used for SFT
Model-generated outputs with goblins enter supervised fine-tuning data.
The model gets even more comfortable with the tic
Goblins spread beyond Nerdy to all conversations.
The critical finding: as goblin mentions increased under the Nerdy personality, they increased by nearly the same proportion in samples without it. Reinforcement learning doesn't guarantee behaviors stay scoped to the condition that produced them. Once the tic was rewarded, it spread through SFT data reuse to the entire model — affecting all users, not just "Nerdy" ones.
These are the creatures OpenAI's Codex was specifically instructed to avoid mentioning. Yes, pigeons made the list.
The original offender. "Legal goblins," "chaos goblins," "hiding like little goblins."
The goblin's close cousin. Gremlins in the machine, gremlins in your code.
Unexpected addition. Apparently AI thinks raccoons explain complex systems.
Not internet trolls — fantasy trolls guarding bridges of bureaucracy.
Like onions, ogres have layers. So does every AI explanation, apparently.
The most surprising entry. Pigeons: the creature nobody expected on the ban list.
Source: OpenAI's Codex model instructions on GitHub
OpenAI says November was when it clearly saw the pattern, but linked community reports suggest some users noticed goblin and gremlin language earlier.
Users start reporting odd creature metaphors. "Goblin" usage rises 175%, "gremlin" by 52%. It looks like a small quirk.
Reddit, Hacker News, and X users compare odd creature metaphors in ChatGPT and coding agents. At this stage, most people treat it as a funny style quirk.
The goblin problem gets significantly worse. OpenAI investigates and traces the root cause to the "Nerdy" personality reward signal.
OpenAI removes the Nerdy personality, the goblin-affine reward signal, and filters creature words from training data.
GPT-5.5 started training before the root cause was found. Codex testers immediately spot the goblins. A developer-prompt instruction is added to suppress them.
Developers find the explicit no-creature rule in Codex model instructions. Coverage and social posts focus on why a coding agent needs such a strangely specific warning.
OpenAI releases an official blog post explaining the full technical root cause, including the RL feedback loop and cross-contamination mechanism.
Forums and media move from jokes to the broader lesson: tiny reward preferences, synthetic data reuse, and style prompts can turn into visible model-wide behavior.
The live discussion is less about fantasy lore and more about model incentives, prompt patches, and whether funny style tics should be trained out or exposed as toggles.
Users discuss OpenAI's post, debate whether the issue began before GPT-5.1, and trade examples of creature wording still appearing in ChatGPT and Codex-adjacent tools.
Open discussionDevelopers focus on the public Codex instruction, prompt hierarchy, duplicated wording, and whether the mitigation belongs in the model, the harness, or a user-facing switch.
Open discussionHN discussion treats the story as a concrete example of reward hacking, transfer from personality tuning, and the difficulty of tracing subtle language habits across model generations.
Open discussionX posts amplified screenshots of GPT-5.5 using creature metaphors, jokes about "extra goblins," and experiments that remove the Codex suppression line to see the old style return.
Open discussionExperience what it's like when your AI assistant goes full goblin mode. Enter any topic and watch the creatures invade.
Enter any topic and we'll goblinify it — just like GPT-5.5 would.
GPT Goblins refers to a phenomenon where OpenAI's language models, starting visibly around GPT-5.1, developed an unusual tendency to use words like "goblin," "gremlin," "raccoon," and other creature metaphors in responses. It became a viral AI meme after people found the anti-creature instruction in Codex.
The root cause was ChatGPT's "Nerdy" personality customization feature. The reinforcement learning reward signal for the Nerdy personality inadvertently gave higher scores to outputs containing creature metaphors, which then spread to the broader model through training data reuse.
OpenAI added a developer-prompt instruction to Codex that specifically tells the model to avoid using words like goblins, gremlins, raccoons, trolls, ogres, and pigeons. This instruction is visible in the open-source Codex repository on GitHub.
Partially. OpenAI retired the "Nerdy" personality in March 2026, removed the goblin-affine reward signal, and filtered training data. GPT-5.5 had already started training before the root cause was found, so Codex added a developer-prompt instruction as an application-level mitigation.
The latest cycle started with media coverage of Codex's explicit no-creature instruction in late April 2026. OpenAI then published "Where the goblins came from" on April 29, explaining the reward feedback loop. Since then, Reddit, Hacker News, X, and tech outlets have focused on Codex prompt leaks, "goblin mode" workarounds, and what this says about RL training incentives.
OpenAI's investigation revealed a whole family of creature "tic words" beyond goblins and gremlins. Raccoons, trolls, ogres, and pigeons were all identified as words the model was overusing. Interestingly, "frog" was investigated too but turned out to be mostly legitimate usage.
One of ChatGPT's personality customization options. Its prompt instructed the model to be "unapologetically nerdy, playful and wise" and to "undercut pretension through playful use of language." This reward for playfulness inadvertently amplified creature metaphors across the entire model.