iAsk.ca

When billion-dollar AIs break down over puzzles a child can do, it’s time to rethink the hype

(1 week ago)
Gary Marcus
Artificial intelligence (AI)ComputingTechnologyAppleChatGPTWorld

AI Summary

TL;DR: Key points with love ❤️

A new research paper by Apple has taken the tech world by storm, challenging the popular notion that large language models (LLMs) and large reasoning models (LRMs) are able to reason reliably. The paper demonstrates that leading AI models like ChatGPT, Claude, and Deepseek, despite being explicitly designed for reasoning tasks, often fail at classic puzzles like the Tower of Hanoi when complexity increases, highlighting their limitations beyond the limits of their training data.

Trending
  1. 1 1957: Herb Simon solved problems with classical AI techniques.
  2. 2 1998: Gary Marcus began making arguments about neural network limitations.
  3. 3 Recently: Apple published a research paper on LLM limitations.
  • Rethinking AI hype and capabilities
  • Potential impact on business applications of LLMs
  • Questions about the path to Artificial General Intelligence (AGI)
  • Continued use of LLMs for coding, brainstorming, and writing, but with humans in the loop
What: Apple's research paper shows that leading AI models (LLMs, LRMs) struggle with reasoning tasks, particularly classic puzzles like the Tower of Hanoi, when complexity rises beyond their training data. This debunks the hype around their reasoning capabilities.
When: Recently (paper published), since 1998 (Gary Marcus's argument), 1957 (Herb Simon's work).
Where: Tech world (general), Apple (company), Arizona State University.
Why: The paper aims to critically expose the overhyped capabilities of artificial intelligence by highlighting their limitations in reasoning, understanding, or general intelligence, especially when encountering novelty outside their training distribution.
How: Apple challenged leading generative models with classic puzzles like the Tower of Hanoi, finding they could barely do seven discs with less than 80% accuracy and failed almost entirely with eight discs. The models' process was observed as not logical or intelligent.

A new research paper by Apple has taken the tech world by storm, challenging the popular notion that large language models (LLMs) and large reasoning models (LRMs) are able to reason reliably. The paper demonstrates that leading AI models like ChatGPT, Claude, and Deepseek, despite being explicitly designed for reasoning tasks, often fail at classic puzzles like the Tower of Hanoi when complexity increases, highlighting their limitations beyond the limits of their training data.