OpenAI's newest AI models hallucinate way more, for reasons unknown

PCWorld 

Last week, OpenAI released its new o3 and o4-mini reasoning models, which perform significantly better than their o1 and o3-mini predecessors and have new capabilities like "thinking with images" and agentically combining AI tools for more complex results. This is unusual as newer models tend to hallucinate less as the underlying AI tech improves. In the realm of LLMs and reasoning AIs, a "hallucination" occurs when the model makes up information that sounds convincing but has no bearing in truth. In other words, when you ask questions to ChatGPT, it may respond with an answer that's patently false or incorrect. OpenAI's in-house benchmark PersonQA--which is used to measure the factual accuracy of its AI models when talking about people--found that o3 hallucinated in 33 percent of responses while o4-mini did even worse at 48 percent.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found