Advanced AI Experiences “Complete Accuracy Collapse” When Given Complex Problems

Artem Gilmutdinov via Unsplash
A recent study from Apple researchers has found that there are “fundamental limitations” in advanced AI models. The research states that large reasoning models (LRMs) will experience a “complete accuracy collapse” when faced with complex problems. While standard AI models performed better than LRMs in simple tasks, both collapsed when presented problems required higher reasoning.
When given higher complexity questions, LRMs began “reducing their reasoning effort” while nearing collapse.
The paper further found that models wasted computing power by correctly solving simpler problems in the beginning stages of the reasoning process; however, as presented issues grew in complexity, models initially explored incorrect solutions before later arriving at correct ones. But as the problems became highly complex, failed to provide any correct solutions. In one instance, a model was given a problem-solving algorithm and still failed.
The research notes, “Upon approaching a critical threshold — which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty.”
Tested models include Google’s Gemini Thinking, OpenAI’s o3, Claude 3.7 Sonnet-Thinking and DeepSeek-R1.
According to experts, these findings suggest a “fundamental scaling limitation in the thinking capabilities of current reasoning models.”
Looking for a reprint of this article?
From high-res PDFs to custom plaques, order your copy today!