Artificial intelligence is known for its ability to imitate, and even surpass, certain human cognitive functions. However, one phenomenon intrigues and concerns both users and researchers: AI ‘hallucinations’, which are erroneous or fabricated responses.
And with OpenAI’s new reasoning models, o3 and o4-mini, this problem seems to be getting worse.
Why do these advanced systems, supposedly representing the pinnacle of technology, stray into imagination more often than their predecessors like GPT-4 or o1? And what solutions can limit this phenomenon?
This article, written by the Yiaho Team, explores the reasons behind these deviations and potential remedies.
Hallucinations: A Side Effect of AI Sophistication?
Large language models like o3 and o4-mini are designed for complex reasoning, relying on massive neural networks trained on colossal amounts of textual data.
This sophistication allows them to produce creative, nuanced, and often impressive responses. But this strength is also their Achilles’ heel.
Studies conducted by OpenAI show that these models, in certain situations, produce erroneous responses at alarming rates: up to a third of cases for o3 and nearly half for o4-mini in specific tests.
Compared to older models like GPT-4 or o1, available for free on our Yiaho platform, these figures are significantly higher.
Why do these recent AIs hallucinate?
The answer lies in the very nature of these advanced models. To generate original and relevant responses, AIs must extrapolate beyond the data they were trained on. This process, akin to a form of creativity, pushes them to explore hypotheses or fill gaps in their knowledge.
While this capability allows for innovative responses, it also opens the door to errors: the AI can “imagine” facts or connections that don’t exist, much like a human who, lacking information, lets their imagination fill in the blanks.
The more a model is trained to reason autonomously, the more it risks venturing into uncertain territory.
New Models: Increased Complexity, Amplified Risk
The o3 and o4-mini models are distinguished by their architecture optimized for reasoning, an advancement that allows them to solve complex problems, such as mathematical puzzles or in-depth contextual analyses. But this sophistication comes at a cost.
In seeking to maximize relevance and originality, these models rely on bolder internal mechanisms, making them more likely to produce unfounded responses.
Unlike their predecessors, which often remained more conservative in their predictions, o3 and o4-mini take more “creative risks,” which increases the probability of hallucinations.
Also read on this topic: Algorithmic Bias in AI: What Is It? And Why Does It Happen?
Another key factor is the scale of the data processed.
The new models, trained on even vaster and more diverse corpora, must navigate an ocean of information that is sometimes contradictory or ambiguous. When they encounter areas of uncertainty, they can generate plausible but false responses, because their internal logic prioritizes apparent coherence over absolute veracity.
This phenomenon is less pronounced in models like GPT-4, which, although powerful, adopt a more cautious approach.
Towards Concrete Solutions: The Rise of Retrieval-Augmented Generation
Faced with this challenge, researchers are exploring approaches to anchor AI responses in reality. One of the most promising solutions is Retrieval-Augmented Generation (RAG).
This technique, already presented in our AI dictionary, allows the AI not to rely solely on its internal knowledge, which is often limited or biased by training data.
Instead, it consults reliable and up-to-date external databases to verify or complete its responses.
For example, an AI using RAG could cross-reference a question with scientific articles or verified sources before formulating a response, thereby reducing the risk of rambling.
RAG acts as a compass for AI, allowing it to stay connected to concrete facts while leveraging its ability to formulate fluid and natural responses. This approach is particularly relevant for models like o3 and o4-mini, whose tendencies to hallucinate stem from their increased autonomy.
By integrating real-time verification mechanisms, developers can limit deviations while preserving the models’ creativity.
See also on this topic: Chat GPT bug? Use Yiaho!
A Balance to Be Found
AI hallucinations, though concerning, are not an insurmountable flaw. They reflect the inherent tension between creativity and reliability in artificial intelligence systems. The o3 and o4-mini models, by pushing the boundaries of artificial reasoning, highlight this paradox: the more powerful an AI is, the more it risks going astray if not guided correctly.
On our Yiaho platform, where models like GPT-4 and o1 offer robust performance with fewer hallucinations, we observe that caution in design can still outweigh boldness.
In the future, approaches like RAG, combined with more rigorous training techniques, could reconcile innovation and precision. In the meantime, users must remain vigilant, verifying AI responses, especially when they come from such audacious models! Because if AI can dream like a human, it’s up to us to remind it where imagination ends and reality begins.
Source: SciencePost


