The paper documents a pattern they called Potemkins, a kind of reasoning inconsistency (see figure below). They show that LLMs - even models like o3 — make these errors frequently.
Gary Marcus: "You can’t possibly create AGI based on machines that cannot keep consistent with their own assertions. You just can’t."

Since we're talking about Gary Marcus, let's diverge a second, here's some amazing blog post from Gary, on Neurosymbolic AI:
Gary Marcus’s essay traces the decades-long debate between two main approaches in artificial intelligence:
-
Symbolic AI (symbol-manipulation approach): Rooted in logic and mathematics, this tradition uses explicit rules, symbols, and databases to represent knowledge and perform reasoning.
-
Neural networks (connectionist approach): Inspired by the brain, these systems learn from large amounts of data and are the foundation of today’s large language models (LLMs) like GPT.
https://garymarcus.substack.com/p/how-o3-and-grok-4-accidentally-vindicated

And more rants from him on the crisis in the industry with talents getting swopped left and right.
