OpenAI explains why ChatGPT developed a goblin fixation, and how it solved the issue

Zac Hall

Concatena says

Our Take: LLMs really do latch onto patterns: OpenAI’s “goblin phase” is a silly example of a serious point – the way you train and reward a model can create odd, persistent behaviours that aren’t obvious from the outside. Model outputs are shaped by hidden system prompts and RL tweaks, not just “the law” or “the facts” you put in.

Your Takeaway: If you’re using LLMs in your business, assume they’ll exaggerate any incentive or pattern you bake in, sometimes in unexpected ways. Treat prompts and “personalities” like configuration, not colour – document them, review them, and stop anthropomorphising them…

OpenAI noticed that ChatGPT kept talking too much about goblins and other mythical creatures. This happened because of a past feature that rewarded creative use of such metaphors. To fix it, they told the new GPT-5.5 model not to mention these creatures unless really needed.

Highlights

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query

The fix, in part, is a specific set of instructions to never talk about goblins unless it’s abundantly relevant:

The goblin problem links back to the “Nerdy personality” option briefly supported by ChatGPT.

To develop the personality, OpenAI needed to “reward” the model to incentivize its creative use of mythical metaphors. However, even after the Nerdy personality option was retired, the model remained unreasonably attached to gremlins, goblins, and other make-believe creatures.

Leave a ReplyCancel Reply