Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test

Pimenta, Rui A., Schlippe, Tim, Schaaff, Kristina

arXiv.org Artificial Intelligence 

--We investigate consciousness-like behaviors in Large Language Models (LLMs) using the Maze T est, challenging models to navigate mazes from a first-person perspective. After synthesizing consciousness theories into 13 essential characteristics, we evaluated 12 leading LLMs across zero-shot, one-shot, and few-shot learning scenarios. Results showed reasoning-capable LLMs consistently outperforming standard versions, with Gemini 2.0 Pro achieving 52.9% Complete Path Accuracy and DeepSeek-R1 reaching 80.5% Partial Path Accuracy . The gap between these metrics indicates LLMs struggle to maintain coherent self-models throughout solutions--a fundamental consciousness aspect. While LLMs show progress in consciousness-related behaviors through reasoning mechanisms, they lack the integrated, persistent self-awareness characteristic of consciousness. The emergence of human-like capabilities in AI has been debated since the field's inception in the 1950s [1], [2]. An early case was ELIZA [3], a chatbot simulating a therapist. Though based on pattern matching, its responses were so convincing that Weizenbaum's secretary requested privacy for a "real conversation"--showing how humans can mistakenly perceive consciousness in even the simplest AI systems.