Why do AI chatbots sometimes spit out answers that sound convincing but are flat-out wrong? According to new research from OpenAI, the problem isn’t just about training data—it may come down to the incentives baked into how we evaluate these systems.
The Persistent Problem of AI Hallucinations
OpenAI defines hallucinations as “plausible but false statements generated by language models.” And despite major improvements in systems like GPT-5 and ChatGPT, hallucinations remain one of the toughest challenges in AI development. In fact, researchers admit they may never fully disappear.
To highlight the issue, researchers asked a popular chatbot basic biographical questions about Adam Tauman Kalai, one of the study’s authors. When asked about his PhD dissertation title, the chatbot confidently produced three different—but wrong—answers. The same happened when asked about his birthday. In each case, the model responded with authority, even though every detail was incorrect.
Why Models Make Confident Mistakes
So why do large language models get it so wrong? The researchers point to the pretraining process, where models learn by predicting the next word in a sentence. Unlike humans, they don’t have “true” or “false” labels attached to those predictions. The result: they get really good at patterns (like grammar or spelling) but struggle with low-frequency, specific facts—like someone’s exact birthday.
As the paper explains: “Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts cannot be predicted from patterns alone and hence lead to hallucinations.”
The Role of Bad Incentives
But the research doesn’t stop at training. It argues that the way models are currently evaluated sets up the wrong incentives. Evaluations typically reward accuracy without penalizing bad guesses—similar to a multiple-choice test where guessing gives you a shot at points, while leaving it blank guarantees zero.
That means models are rewarded for sounding confident, even when they’re wrong, rather than admitting uncertainty. Over time, they learn to “guess” instead of ever saying “I don’t know.”
A Different Way to Grade AI
OpenAI’s proposal is simple but potentially game-changing: evaluations should penalize confident errors more than cautious uncertainty. Just like the SAT discourages random guessing with negative scoring, AI tests could reward models for appropriately expressing uncertainty and avoid giving credit for lucky guesses.
“If the main scoreboards keep rewarding lucky guesses, models will keep learning to guess,” the researchers argue.
In practice, this means AI developers can’t just tack on a few “uncertainty-aware” tests as side experiments. Instead, the entire evaluation system needs a rethink if we want to see more reliable AI responses.
Why It Matters
Hallucinations might sound like a quirky side-effect of chatbots, but they have real-world consequences. From spreading misinformation to giving dangerously wrong medical or legal advice, the stakes are too high to ignore. By shifting how we grade AI systems, OpenAI suggests we could train models to value accuracy and honesty over empty confidence.
What do you think: should AI be trained to say “I don’t know” more often, even if it makes the chatbot feel less human? Share your thoughts below.




