👹

What Are GPT Goblins?

GPT Goblins is the AI meme about OpenAI's models developing a strange obsession with goblins, gremlins, raccoons, and other creatures in their metaphors — plus the official investigation, Codex prompt rule, and community hunt for the creatures in the logs.

Updated May 8, 2026

Latest GPT Goblins News

The story moved from a weird Codex prompt line to an official OpenAI post, then into Reddit, Hacker News, X, and tech media as people compared sightings and tried to trigger the old style.

Apr 29, 2026

OpenAI publishes the official post

OpenAI traced the creature-word spike to the Nerdy personality reward signal, SFT data reuse, and transfer into non-Nerdy samples. It also confirmed GPT-5.5 began training before the root cause was fixed.

OpenAI
Apr 30-May 2, 2026

Tech media picks up the Codex prompt

Wired, Ars Technica, PC Gamer, PCWorld, Android Authority, Gizmodo, and others covered the repeated Codex instruction and the community reaction to a model being told to avoid creature metaphors.

Ars Technica
May 6, 2026

Researchers frame it as incentive drift

Northeastern coverage highlighted the broader lesson: small reward preferences can compound into visible model behavior, even when the behavior looks silly on the surface.

Northeastern Global News

The Short Explanation

Here's the three-step story of how AI models started talking like fantasy creatures.

🌱

Started in GPT-5.1

After GPT-5.1 launched in November 2025, "goblin" usage in ChatGPT rose by 175% and "gremlin" by 52%. At first it seemed like a small lexical quirk.

🤓

Amplified by "Nerdy"

ChatGPT's "Nerdy" personality accounted for only 2.5% of responses but 66.7% of all goblin mentions. Its reward signal was boosting creature metaphors.

🔄

Leaked into GPT-5.5

GPT-5.5 started training before the root cause was found. When tested in Codex, employees immediately noticed the goblins. OpenAI added a prompt instruction to suppress them.

Why Did This Happen?

The technical root cause: a reinforcement learning reward signal created an unintended feedback loop.

The "Nerdy" Personality Prompt

ChatGPT offers personality customization. The "Nerdy" personality used this system prompt:

"You are an unapologetically nerdy, playful and wise AI mentor... You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed..."

The RL Feedback Loop

OpenAI's analysis found the Nerdy personality reward showed a clear tendency to score outputs containing "goblin" or "gremlin" higher, with positive uplift in 76.2% of datasets. Here's how the loop worked:

1

Playful style is rewarded

The Nerdy personality reward favors creative, playful language.

2

Some rewarded outputs contain creature tics

"Little goblin" and "chaos gremlin" get high scores.

3

The tic appears more often in rollouts

The model learns that creature metaphors = higher rewards.

4

Rollouts are used for SFT

Model-generated outputs with goblins enter supervised fine-tuning data.

5

The model gets even more comfortable with the tic

Goblins spread beyond Nerdy to all conversations.

Cross-Contamination

The critical finding: as goblin mentions increased under the Nerdy personality, they increased by nearly the same proportion in samples without it. Reinforcement learning doesn't guarantee behaviors stay scoped to the condition that produced them. Once the tic was rewarded, it spread through SFT data reuse to the entire model — affecting all users, not just "Nerdy" ones.

The Forbidden Creature List

These are the creatures OpenAI's Codex was specifically instructed to avoid mentioning. Yes, pigeons made the list.

👹

Goblins

The original offender. "Legal goblins," "chaos goblins," "hiding like little goblins."

👿

Gremlins

The goblin's close cousin. Gremlins in the machine, gremlins in your code.

🦝

Raccoons

Unexpected addition. Apparently AI thinks raccoons explain complex systems.

🧌

Trolls

Not internet trolls — fantasy trolls guarding bridges of bureaucracy.

👾

Ogres

Like onions, ogres have layers. So does every AI explanation, apparently.

🐦

Pigeons

The most surprising entry. Pigeons: the creature nobody expected on the ban list.

Source: OpenAI's Codex model instructions on GitHub

Timeline of Events

Before November 2025

Early User Sightings

OpenAI says November was when it clearly saw the pattern, but linked community reports suggest some users noticed goblin and gremlin language earlier.

November 2025

GPT-5.1 Launches — Goblins Appear

Users start reporting odd creature metaphors. "Goblin" usage rises 175%, "gremlin" by 52%. It looks like a small quirk.

Early 2026

Community Notices

Reddit, Hacker News, and X users compare odd creature metaphors in ChatGPT and coding agents. At this stage, most people treat it as a funny style quirk.

March 2026

GPT-5.4 Launches — Goblins Multiply

The goblin problem gets significantly worse. OpenAI investigates and traces the root cause to the "Nerdy" personality reward signal.

Mid-March 2026

"Nerdy" Personality Retired

OpenAI removes the Nerdy personality, the goblin-affine reward signal, and filters creature words from training data.

April 23, 2026

GPT-5.5 Launches — Goblins Persist

GPT-5.5 started training before the root cause was found. Codex testers immediately spot the goblins. A developer-prompt instruction is added to suppress them.

Late April 2026

Codex Prompt Becomes the Meme

Developers find the explicit no-creature rule in Codex model instructions. Coverage and social posts focus on why a coding agent needs such a strangely specific warning.

April 29, 2026

OpenAI Publishes "Where the Goblins Came From"

OpenAI releases an official blog post explaining the full technical root cause, including the RL feedback loop and cross-contamination mechanism.

May 2026

Discussion Shifts to Model Incentives

Forums and media move from jokes to the broader lesson: tiny reward preferences, synthetic data reuse, and style prompts can turn into visible model-wide behavior.

Forum and Social Media Pulse

The live discussion is less about fantasy lore and more about model incentives, prompt patches, and whether funny style tics should be trained out or exposed as toggles.

Reddit / r/OpenAI

Official explanation thread

Users discuss OpenAI's post, debate whether the issue began before GPT-5.1, and trade examples of creature wording still appearing in ChatGPT and Codex-adjacent tools.

Open discussion
Reddit / Codex searches

Codex prompt discovery

Developers focus on the public Codex instruction, prompt hierarchy, duplicated wording, and whether the mitigation belongs in the model, the harness, or a user-facing switch.

Open discussion
Hacker News

Training-data and RL debate

HN discussion treats the story as a concrete example of reward hacking, transfer from personality tuning, and the difficulty of tracing subtle language habits across model generations.

Open discussion
X / tech accounts

Screenshots and "goblin mode"

X posts amplified screenshots of GPT-5.5 using creature metaphors, jokes about "extra goblins," and experiments that remove the Codex suppression line to see the old style return.

Open discussion

Goblin Mode Generator

Experience what it's like when your AI assistant goes full goblin mode. Enter any topic and watch the creatures invade.

👹 Goblin Mode Generator

Enter any topic and we'll goblinify it — just like GPT-5.5 would.

Frequently Asked Questions

What are GPT Goblins?

GPT Goblins refers to a phenomenon where OpenAI's language models, starting visibly around GPT-5.1, developed an unusual tendency to use words like "goblin," "gremlin," "raccoon," and other creature metaphors in responses. It became a viral AI meme after people found the anti-creature instruction in Codex.

Why does ChatGPT talk about goblins?

The root cause was ChatGPT's "Nerdy" personality customization feature. The reinforcement learning reward signal for the Nerdy personality inadvertently gave higher scores to outputs containing creature metaphors, which then spread to the broader model through training data reuse.

What is the Codex goblin rule?

OpenAI added a developer-prompt instruction to Codex that specifically tells the model to avoid using words like goblins, gremlins, raccoons, trolls, ogres, and pigeons. This instruction is visible in the open-source Codex repository on GitHub.

Did OpenAI fix the goblin issue?

Partially. OpenAI retired the "Nerdy" personality in March 2026, removed the goblin-affine reward signal, and filtered training data. GPT-5.5 had already started training before the root cause was found, so Codex added a developer-prompt instruction as an application-level mitigation.

What is the latest news as of May 2026?

The latest cycle started with media coverage of Codex's explicit no-creature instruction in late April 2026. OpenAI then published "Where the goblins came from" on April 29, explaining the reward feedback loop. Since then, Reddit, Hacker News, X, and tech outlets have focused on Codex prompt leaks, "goblin mode" workarounds, and what this says about RL training incentives.

Why are raccoons and pigeons on the forbidden list?

OpenAI's investigation revealed a whole family of creature "tic words" beyond goblins and gremlins. Raccoons, trolls, ogres, and pigeons were all identified as words the model was overusing. Interestingly, "frog" was investigated too but turned out to be mostly legitimate usage.

What was the "Nerdy" personality?

One of ChatGPT's personality customization options. Its prompt instructed the model to be "unapologetically nerdy, playful and wise" and to "undercut pretension through playful use of language." This reward for playfulness inadvertently amplified creature metaphors across the entire model.