Large-scale Generative AI Models Lack Visual Number Sense

Testolin, Alberto, Hou, Kuinan, Zorzi, Marco

arXiv.org Artificial Intelligence 

Humans can readily judge the number of objects in a visual scene, even without counting, and such a skill has been documented in a variety of animal species and in babies prior to language development and formal schooling. Numerical judgments are error-free for small sets, while for larger collections responses become approximate, with variability increasing proportionally to the target number. This response pattern is observed for items of all kinds, despite variation in object features (such as color or shape), suggesting that our visual number sense relies on abstract representations of numerosity. Here, we investigated whether generative Artificial Intelligence (AI) models based on large-scale transformer architectures can reliably name the number of objects in simple visual stimuli or generate images containing a target number of items in the 1-10 range. Surprisingly, none of the foundation models considered performed in a human-like way: They all made striking errors even with small numbers, the response variability often did not increase in a systematic way, and the pattern of errors varied with object category. Our findings demonstrate that advanced AI systems still lack a basic ability that supports an intuitive understanding of numbers, which in humans is foundational for numeracy and mathematical development.