Haunted House: A text-based game for comparing the flexibility of mental models in humans and LLMs

Puppart, Brett, Paltmann, Paul-Henry, Aru, Jaan

arXiv.org Artificial Intelligence 

The advent of transformer-based large language models (LLMs) has reignited the philosophical debate of human significance - a question that has persisted for millennia. Aristotle thought the function of humans was to live according to the rational principle, which was something that distinguished us from other animals (Aristotle, 2014) . Back then, this might have seemed like a reasonable conclusion, as humans use complex language and abstract thinking to a degree that other animals simply do not. However, recent advancements in artificial intelligence (AI) are shining light on the possibility that in the future we might be living in a world in which our creation is more intelligent than us - or perhaps that this world is already here. In many benchmarks comparing humans and AI, LLMs have shown a trend of rapid increase in performance. In SimpleBench, which measures common sense reasoning and social intelligence, GPT-4o scored only 17.8% and o1-preview 41.7% (Philip & Hemang, 2024) .