AITopics | cognitive load

Collaborating Authors

cognitive load

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

Liu, Emmy, Gangal, Varun, Yu, Michael, Tao, Zhuofu, Singh, Karan, Kumar, Sachin, Feng, Steven Y.

arXiv.org Machine LearningMay-20-2026

Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across tasks such as summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting actually reduces hallucinations across contexts. Current hallucination benchmarks either require human annotation and fixed references that may eventually be memorized, or rely on naturalistic observations often recorded in settings that are difficult to reproduce or test systematically. To enable further research on the root causes of hallucination, we introduce HALLUWORLD, an extensible benchmark framework grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this reference world. Building on this view, we construct a family of synthetic and semi-synthetic benchmark environments in which the reference world is fully specified, the model's observable view is controlled, and hallucination labels can be generated automatically by construction. HALLUWORLD spans multiple settings that are classically representative for AI, i.e., gridworlds, chess, and realistic terminal tasks. This enables controlled variation of key factors such as world complexity, observability, temporal change, and source-conflict policy, allowing us to disentangle hallucinations into more fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns across domains: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation are still difficult for frontier models, and are not generally solved by extended thinking.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2605.19341

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Add feedback

Understanding Mental States in Active and Autonomous Driving with EEG

Angkan, Prithila, Hungler, Paul, Etemad, Ali

arXiv.org Artificial IntelligenceDec-11-2025

Understanding how driver mental states differ between active and autonomous driving is critical for designing safe human-vehicle interfaces. This paper presents the first EEG-based comparison of cognitive load, fatigue, valence, and arousal across the two driving modes. Using data from 31 participants performing identical tasks in both scenarios of three different complexity levels, we analyze temporal patterns, task-complexity effects, and channel-wise activation differences. Our findings show that although both modes evoke similar trends across complexity levels, the intensity of mental states and the underlying neural activation differ substantially, indicating a clear distribution shift between active and autonomous driving. Transfer-learning experiments confirm that models trained on active driving data generalize poorly to autonomous driving and vice versa. We attribute this distribution shift primarily to differences in motor engagement and attentional demands between the two driving modes, which lead to distinct spatial and temporal EEG activation patterns. Although autonomous driving results in lower overall cortical activation, participants continue to exhibit measurable fluctuations in cognitive load, fatigue, valence, and arousal associated with readiness to intervene, task-evoked emotional responses, and monotony-related passive fatigue. These results emphasize the need for scenario-specific data and models when developing next-generation driver monitoring systems for autonomous vehicles.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.0919

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation

Dokania, Srijan, Raghavan, Dharini

arXiv.org Artificial IntelligenceDec-10-2025

Abstract--We introduce Zero-Splat T eleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleopera-tion. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction and 3-D Gaussian Splatting (3DGS), T eleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation. Teleoperating robots in complex or remote environments is challenging due to limited on-board perception, occlusions, and operator cognitive load. Traditional teleoperation relies on the robot's sensors (cameras, LiDAR, IMU) which often experiences narrow fields of view, occlusions, cumulative drift, collectively increasing the cognitive load on human operators who must maintain situational awareness. Meanwhile, external camera infrastructures (e.g., CCTV) have potential to provide complementary visual coverage and global contextualization, but conventional solutions rely heavily on visual fiducials, such as AprilTags or ArUco markers [5], or motion-capture systems requiring controlled lighting and calibration processes.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2512.08271

Country: North America > United States (0.15)

Genre: Research Report (0.40)

Industry: Government (0.36)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Making Evidence Actionable in Adaptive Learning

Mehrabi, Amirreza, Morphew, Jason W., Quezada, Breejha, Rebello, N. Sanjay

arXiv.org Artificial IntelligenceNov-19-2025

Adaptive learning often diagnoses precisely yet intervenes weakly, yielding help that is mistimed or misaligned. This study presents evidence supporting an instructor-governed feedback loop that converts concept-level assessment evidence into vetted micro-interventions. The adaptive learning algorithm contains three safeguards: adequacy as a hard guarantee of gap closure, attention as a budgeted constraint for time and redundancy, and diversity as protection against overfitting to a single resource. We formalize intervention assignment as a binary integer program with constraints for coverage, time, difficulty windows informed by ability estimates, prerequisites encoded by a concept matrix, and anti-redundancy enforced through diversity. Greedy selection serves low-richness and tight-latency regimes, gradient-based relaxation serves rich repositories, and a hybrid method transitions along a richness-latency frontier. In simulation and in an introductory physics deployment with one thousand two hundred four students, both solvers achieved full skill coverage for essentially all learners within bounded watch time. The gradient-based method reduced redundant coverage by approximately twelve percentage points relative to greedy and harmonized difficulty across slates, while greedy delivered comparable adequacy with lower computational cost in scarce settings. Slack variables localized missing content and supported targeted curation, sustaining sufficiency across subgroups. The result is a tractable and auditable controller that closes the diagnostic-pedagogical loop and delivers equitable, load-aware personalization at classroom scale.

artificial intelligence, machine learning, student, (19 more...)

arXiv.org Artificial Intelligence

2511.14052

Country: North America > United States (0.46)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

Multi-Domain EEG Representation Learning with Orthogonal Mapping and Attention-based Fusion for Cognitive Load Classification

Angkan, Prithila, Jalali, Amin, Hungler, Paul, Etemad, Ali

arXiv.org Artificial IntelligenceNov-18-2025

Abstract--We propose a new representation learning solution for the classification of cognitive load based on Electroencephalogram (EEG). Our method integrates both time and frequency domains by first passing the raw EEG signals through the convolutional encoder to obtain the time domain representations. Next, we measure the Power Spectral Density (PSD) for all five EEG frequency bands and generate the channel power values as 2D images referred to as multi-spectral topography maps. These multi-spectral topography maps are then fed to a separate encoder to obtain the representations in frequency domain. Our solution employs a multi-domain attention module that maps these domain-specific embeddings onto a shared embedding space to emphasize more on important inter-domain relationships to enhance the representations for cognitive load classification. Additionally, we incorporate an orthogonal projection constraint during the training of our method to effectively increase the inter-class distances while improving intra-class clustering. This enhancement allows efficient discrimination between different cognitive states and aids in better grouping of similar states within the feature space. Our results demonstrate the superiority of our multi-domain approach over the traditional single-domain techniques. Moreover, we conduct ablation and sensitivity analyses to assess the impact of various components of our method. Finally, robustness experiments on different amounts of added noise demonstrate the stability of our method compared to other state-of-the-art solutions. LECTROENCEPHALOGRAPHY (EEG) serves as a non-invasive method for measuring the electrical activities of the brain by placing electrodes on the scalp and forehead [1]. Numerous studies have highlighted various factors influencing brain activity [2], including cognitive load and affect [3], [4]. As a result, EEG signals can be recorded and leveraged in conjunction with machine learning and deep learning techniques for detecting and quantifying cognitive load [5] and emotions [6]. Cognitive load is defined as the mental workload required to perform a task [7].

artificial intelligence, data quality, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.12394

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.93)

Add feedback

The Cognitive Bandwidth Bottleneck: Shifting Long-Horizon Agent from Planning with Actions to Planning with Schemas

Xu, Baixuan, Zheng, Tianshi, Wang, Zhaowei, Tsang, Hong Ting, Wang, Weiqi, Fang, Tianqing, Song, Yangqiu

arXiv.org Artificial IntelligenceOct-9-2025

Enabling LLMs to effectively operate long-horizon task which requires long-term planning and multiple interactions is essential for open-world autonomy. Conventional methods adopt planning with actions where a executable action list would be provided as reference. However, this action representation choice would be impractical when the environment action space is combinatorial exploded (e.g., open-ended real world). This naturally leads to a question: As environmental action space scales, what is the optimal action representation for long-horizon agents? In this paper, we systematically study the effectiveness of two different action representations. The first one is conventional planning with actions (PwA) which is predominantly adopted for its effectiveness on existing benchmarks. The other one is planning with schemas (PwS) which instantiate an action schema into action lists (e.g., "move [OBJ] to [OBJ]" -> "move apple to desk") to ensure concise action space and reliable scalability. This alternative is motivated by its alignment with human cognition and its compliance with environment-imposed action format restriction. We propose cognitive bandwidth perspective as a conceptual framework to qualitatively understand the differences between these two action representations and empirically observe a representation-choice inflection point between ALFWorld (~35 actions) and SciWorld (~500 actions), which serve as evidence of the need for scalable representations. We further conduct controlled experiments to study how the location of this inflection point interacts with different model capacities: stronger planning proficiency shifts the inflection rightward, whereas better schema instantiation shifts it leftward. Finally, noting the suboptimal performance of PwS agents, we provide an actionable guide for building more capable PwS agents for better scalable autonomy.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.07091

Country:

Asia (0.93)
North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

MultiPhysio-HRC: Multimodal Physiological Signals Dataset for industrial Human-Robot Collaboration

Bussolan, Andrea, Baraldo, Stefano, Avram, Oliver, Urcola, Pablo, Montesano, Luis, Gambardella, Luca Maria, Valente, Anna

arXiv.org Artificial IntelligenceOct-2-2025

Abstract-- Human-robot collaboration (HRC) is a key focus of Industry 5.0, aiming to enhance worker productivity while ensuring well-being. The ability to perceive human psycho-physical states, such as stress and cognitive load, is crucial for adaptive and human-aware robotics. This paper introduces MultiPhysio-HRC, a multimodal dataset containing physiological, audio, and facial data collected during real-world HRC scenarios. The dataset includes electroencephalography (EEG), electrocardiography (ECG), electrodermal activity (EDA), respiration (RESP), electromyography (EMG), voice recordings, and facial action units. The dataset integrates controlled cognitive tasks, immersive virtual reality experiences, and industrial disassembly activities performed manually and with robotic assistance, to capture a holistic view of the participants' mental states. Rich ground truth annotations were obtained using validated psychological self-assessment questionnaires. Baseline models were evaluated for stress and cognitive load classification, demonstrating the dataset's potential for affective computing and human-aware robotics research. MultiPhysio-HRC is publicly available to support research in human-centered automation, workplace well-being, and intelligent robotic systems.

artificial intelligence, human computer interaction, speech recognition, (18 more...)

arXiv.org Artificial Intelligence

2510.00703

Country:

Europe (0.94)
North America > United States (0.48)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.63)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.54)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.49)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

Add feedback

Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning

Adapala, Sai Teja Reddy

arXiv.org Artificial IntelligenceSep-29-2025

The scaling of Large Language Models (LLMs) has exposed a critical gap between their performance on static benchmarks and their fragility in dynamic, information-rich environments. While models excel at isolated tasks, the computational limits that govern their reasoning under cognitive load remain poorly understood. In this work, we introduce a formal theory of computational cognitive load, positing that extraneous, task-irrelevant information (Context Saturation) and interference from task-switching (Attentional Residue) are key mechanisms that degrade performance. We designed the Interleaved Cognitive Evaluation (ICE), a deconfounded benchmark to systematically manipulate these load factors on challenging multi-hop reasoning tasks. A comprehensive study (N = 10 replications per item across 200 questions) revealed significant performance variations across five instruction-tuned models. Smaller open-source architectures (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.2) exhibited baseline brittleness, achieving 0% accuracy (SEM = 0.0) across all conditions, including clean controls, on this high-intrinsic-load task. In contrast, Gemini-2.0-Flash-001 showed partial resilience, achieving 85% accuracy in control conditions, with a statistically significant degradation under context saturation ($β= -0.003$ per % load, $p < 0.001$). These findings provide preliminary evidence that cognitive load is a key contributor to reasoning failures, supporting theories of hallucination-as-guessing under uncertainty. We conclude that dynamic, cognitive-aware stress testing, as exemplified by the ICE benchmark, is essential for evaluating the true resilience and safety of advanced AI systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.19517

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Education (0.68)
Information Technology (0.46)
Government (0.46)
Law > Business Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density

Kaiser, Daniel, Frigessi, Arnoldo, Ramezani-Kebrya, Ali, Ricaud, Benjamin

arXiv.org Artificial IntelligenceSep-26-2025

Current benchmarks for long-context reasoning in Large Language Models (LLMs) often blur critical factors like intrinsic task complexity, distractor interference, and task length. To enable more precise failure analysis, we introduce CogniLoad, a novel synthetic benchmark grounded in Cognitive Load Theory (CLT). CogniLoad generates natural-language logic puzzles with independently tunable parameters that reflect CLT's core dimensions: intrinsic difficulty ($d$) controls intrinsic load; distractor-to-signal ratio ($ρ$) regulates extraneous load; and task length ($N$) serves as an operational proxy for conditions demanding germane load. Evaluating 22 SotA reasoning LLMs, CogniLoad reveals distinct performance sensitivities, identifying task length as a dominant constraint and uncovering varied tolerances to intrinsic complexity and U-shaped responses to distractor ratios. By offering systematic, factorial control over these cognitive load dimensions, CogniLoad provides a reproducible, scalable, and diagnostically rich tool for dissecting LLM reasoning limitations and guiding future model development.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.18458

Country: Europe (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory

Shang, HaoYang, Liu, Xuan, Liang, Zi, Zhang, Jie, Hu, Haibo, Guo, Song

arXiv.org Artificial IntelligenceSep-26-2025

Large Language Models (LLMs) exhibit a notable performance ceiling on complex, multi-faceted tasks, as they often fail to integrate diverse information or adhere to multiple constraints. We posit that such limitation arises when the demands of a task exceed the LLM's effective cognitive load capacity. This interpretation draws a strong analogy to Cognitive Load Theory (CLT) in cognitive science, which explains similar performance boundaries in the human mind, and is further supported by emerging evidence that reveals LLMs have bounded working memory characteristics. Building upon this CLT-grounded understanding, we introduce CoThinker, a novel LLM-based multi-agent framework designed to mitigate cognitive overload and enhance collaborative problem-solving abilities. CoThinker operationalizes CLT principles by distributing intrinsic cognitive load through agent specialization and managing transactional load via structured communication and a collective working memory. We empirically validate CoThinker on complex problem-solving tasks and fabricated high cognitive load scenarios, demonstrating improvements over existing multi-agent baselines in solution quality and efficiency. Our analysis reveals characteristic interaction patterns, providing insights into the emergence of collective cognition and effective load management, thus offering a principled approach to overcoming LLM performance ceilings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.06843

Country:

North America > Mexico (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)

Add feedback