Goto

Collaborating Authors

 cognitive impairment


Physics-Informed Neural Koopman Machine for Interpretable Longitudinal Personalized Alzheimer's Disease Forecasting

Hrusanov, Georgi, Vu, Duy-Thanh, Can, Duy-Cat, Tascedda, Sophie, Ryan, Margaret, Bodelet, Julien, Koscielska, Katarzyna, Magnus, Carsten, Chén, Oliver Y.

arXiv.org Artificial Intelligence

Early forecasting of individual cognitive decline in Alzheimer's disease (AD) is central to disease evaluation and management. Despite advances, it is as of yet challenging for existing methodological frameworks to integrate multimodal data for longitudinal personalized forecasting while maintaining interpretability. To address this gap, we present the Neural Koopman Machine (NKM), a new machine learning architecture inspired by dynamical systems and attention mechanisms, designed to forecast multiple cognitive scores simultaneously using multimodal genetic, neuroimaging, proteomic, and demographic data. NKM integrates analytical ($α$) and biological ($β$) knowledge to guide feature grouping and control the hierarchical attention mechanisms to extract relevant patterns. By implementing Fusion Group-Aware Hierarchical Attention within the Koopman operator framework, NKM transforms complex nonlinear trajectories into interpretable linear representations. To demonstrate NKM's efficacy, we applied it to study the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Our results suggest that NKM consistently outperforms both traditional machine learning methods and deep learning models in forecasting trajectories of cognitive decline. Specifically, NKM (1) forecasts changes of multiple cognitive scores simultaneously, (2) quantifies differential biomarker contributions to predicting distinctive cognitive scores, and (3) identifies brain regions most predictive of cognitive deterioration. Together, NKM advances personalized, interpretable forecasting of future cognitive decline in AD using past multimodal data through an explainable, explicit system and reveals potential multimodal biological underpinnings of AD progression.


National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution

Zolnoori, Maryam, Azadmaleki, Hossein, Haghbin, Yasaman, Zolnour, Ali, Nezhad, Mohammad Javad Momeni, Rashidi, Sina, Naserian, Mehdi, Esmaeili, Elyas, Arpanahi, Sepehr Karimi

arXiv.org Artificial Intelligence

Alzheimer's disease and related dementias (ADRD) affect one in five adults over 60, yet more than half of individuals with cognitive decline remain undiagnosed. Speech-based assessments show promise for early detection, as phonetic motor planning deficits alter acoustic features (e.g., pitch, tone), while memory and language impairments lead to syntactic and semantic errors. However, conventional speech-processing pipelines with hand-crafted features or general-purpose audio classifiers often exhibit limited performance and generalizability. To address these limitations, we introduce SpeechCARE, a multimodal speech processing pipeline that leverages pretrained, multilingual acoustic and linguistic transformer models to capture subtle speech-related cues associated with cognitive impairment. Inspired by the Mixture of Experts (MoE) paradigm, SpeechCARE employs a dynamic fusion architecture that weights transformer-based acoustic, linguistic, and demographic inputs, allowing integration of additional modalities (e.g., social factors, imaging) and enhancing robustness across diverse tasks. Its robust preprocessing includes automatic transcription, large language model (LLM)-based anomaly detection, and task identification. A SHAP-based explainability module and LLM reasoning highlight each modality's contribution to decision-making. SpeechCARE achieved AUC = 0.88 and F1 = 0.72 for classifying cognitively healthy, MCI, and AD individuals, with AUC = 0.90 and F1 = 0.62 for MCI detection. Bias analysis showed minimal disparities, except for adults over 80. Mitigation techniques included oversampling and weighted loss. Future work includes deployment in real-world care settings (e.g., VNS Health, Columbia ADRC) and EHR-integrated explainability for underrepresented populations in New York City.


LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data

Zolnour, Ali, Azadmaleki, Hossein, Haghbin, Yasaman, Taherinezhad, Fatemeh, Nezhad, Mohamad Javad Momeni, Rashidi, Sina, Khani, Masoud, Taleban, AmirSajjad, Sani, Samin Mahdizadeh, Dadkhah, Maryam, Noble, James M., Bakken, Suzanne, Yaghoobzadeh, Yadollah, Vahabie, Abdol-Hossein, Rouhizadeh, Masoud, Zolnoori, Maryam

arXiv.org Artificial Intelligence

Alzheimer's disease and related dementias(ADRD) affect nearly five million older adults in the United States, yet more than half remain undiagnosed. Speech-based natural language processing(NLP) offers a scalable approach for detecting early cognitive decline through subtle linguistic markers that may precede clinical diagnosis. This study develops and evaluates a speech-based screening pipeline integrating transformer embeddings with handcrafted linguistic features, synthetic augmentation using large language models(LLMs), and benchmarking of unimodal and multimodal classifiers. External validation assessed generalizability to a MCI-only cohort. Transcripts were drawn from the ADReSSo 2021 benchmark dataset(n=237, Pitt Corpus) and the DementiaBank Delaware corpus(n=205, MCI vs. controls). Ten transformer models were tested under three fine-tuning strategies. A late-fusion model combined embeddings from the top transformer with 110 linguistic features. Five LLMs(LLaMA8B/70B, MedAlpaca7B, Ministral8B,GPT-4o) generated label-conditioned synthetic speech for augmentation, and three multimodal LLMs(GPT-4o,Qwen-Omni,Phi-4) were evaluated in zero-shot and fine-tuned modes. On ADReSSo, the fusion model achieved F1=83.3(AUC=89.5), outperforming transformer-only and linguistic baselines. MedAlpaca7B augmentation(2x) improved F1=85.7, though larger scales reduced gains. Fine-tuning boosted unimodal LLMs(MedAlpaca7B F1=47.7=>78.7), while multimodal models performed lower (Phi-4=71.6;GPT-4o=67.6). On Delaware, the fusion plus 1x MedAlpaca7B model achieved F1=72.8(AUC=69.6). Integrating transformer and linguistic features enhances ADRD detection. LLM-based augmentation improves data efficiency but yields diminishing returns, while current multimodal models remain limited. Validation on an independent MCI cohort supports the pipeline's potential for scalable, clinically relevant early screening.


Predicting Cognitive Assessment Scores in Older Adults with Cognitive Impairment Using Wearable Sensors

Habadi, Assma, Zefran, Milos, Yin, Lijuan, Song, Woojin, Caceres, Maria, Hu, Elise, Muramatsu, Naoko

arXiv.org Artificial Intelligence

Background and Objectives: This paper focuses on using AI to assess the cognitive function of older adults with mild cognitive impairment or mild dementia using physiological data provided by a wearable device. Cognitive screening tools are disruptive, time-consuming, and only capture brief snapshots of activity. Wearable sensors offer an attractive alternative by continuously monitoring physiological signals. This study investigated whether physiological data can accurately predict scores on established cognitive tests. Research Design and Methods: We recorded physiological signals from 23 older adults completing three NIH Toolbox Cognitive Battery tests, which assess working memory, processing speed, and attention. The Empatica EmbracePlus, a wearable device, measured blood volume pulse, skin conductance, temperature, and movement. Statistical features were extracted using wavelet-based and segmentation methods. We then applied supervised learning and validated predictions via cross-validation, hold-out testing, and bootstrapping. Results: Our models showed strong performance with Spearman's ρof 0.73-0.82 and mean absolute errors of 0.14-0.16, significantly outperforming a naive mean predictor. Sensor roles varied: heart-related signals combined with movement and temperature best predicted working memory, movement paired with skin conductance was most informative for processing speed, and heart in tandem with skin conductance worked best for attention. Discussion and Implications: These findings suggest that wearable sensors paired with AI tools such as supervised learning and feature engineering can noninvasively track specific cognitive functions in older adults, enabling continuous monitoring. Our study demonstrates how AI can be leveraged when the data sample is small. This approach may support remote assessments and facilitate clinical interventions.


Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis

Nie, Yujie, Ni, Jianzhang, Ye, Yonglong, Zhang, Yuan-Ting, Wing, Yun Kwok, Xu, Xiangqing, Ma, Xin, Fan, Lizhou

arXiv.org Artificial Intelligence

Accurate diagnosis of Alzheimer's disease (AD) is essential for enabling timely intervention and slowing disease progression. Multimodal diagnostic approaches offer considerable promise by integrating complementary information across behavioral and perceptual domains. Eye-tracking and facial features, in particular, are important indicators of cognitive function, reflecting attentional distribution and neurocognitive state. However, few studies have explored their joint integration for auxiliary AD diagnosis. In this study, we propose a multimodal cross-enhanced fusion framework that synergistically leverages eye-tracking and facial features for AD detection. The framework incorporates two key modules: (a) a Cross-Enhanced Fusion Attention Module (CEF AM), which models inter-modal interactions through cross-attention and global enhancement, and (b) a Direction-Aware Convolution Module (DACM), which captures fine-grained directional facial features via horizontal-vertical receptive fields. To support this work, we constructed a synchronized multimodal dataset, including 25 patients with AD and 25 healthy controls (HC), by recording aligned facial video and eye-tracking sequences during a visual memory-search paradigm, providing an ecologically valid resource for evaluating integration strategies. Extensive experiments on this dataset demonstrate that our framework outperforms traditional late fusion and feature concatenation methods, achieving a classification accuracy of 95.11% in distinguishing AD from HC, highlighting superior robustness and diagnostic performance by explicitly modeling inter-modal dependencies and modality-specific contributions. Introduction Alzheimer's disease (AD), a progressive and irreversible neurodegenera-tive disorder, represents the primary cause of dementia in older adults [1]. It typically begins with mild memory loss and gradually progresses to severe impairments in executive and cognitive functions [2]. Within the global aging population, more than 150 million people worldwide will be affected by AD or other forms of dementia [3], imposing a substantial burden on both families and healthcare systems. Early and accurate identification of Alzheimer's disease is vital to initiate interventions that may slow progression and improve quality of life. Clinically, the diagnosis of AD primarily relies on biomarker analysis, neu-roimaging techniques, and neuropsychological assessments.


Leveraging LLMs for Early Alzheimer's Prediction

Songdechakraiwut, Tananun

arXiv.org Artificial Intelligence

We present a connectome-informed LLM framework that encodes dynamic fMRI connectivity as temporal sequences, applies robust normalization, and maps these data into a representation suitable for a frozen pre-trained LLM for clinical prediction. Applied to early Alzheimer's detection, our method achieves sensitive prediction with error rates well below clinically recognized margins, with implications for timely Alzheimer's intervention.


CogBench: A Large Language Model Benchmark for Multilingual Speech-Based Cognitive Impairment Assessment

Feng, Rui, Luo, Zhiyao, Wang, Wei, Song, Yuting, Liu, Yong, Zhu, Tingting, Li, Jianqing, Wang, Xingyao

arXiv.org Artificial Intelligence

Automatic assessment of cognitive impairment from spontaneous speech offers a promising, non-invasive avenue for early cognitive screening. However, current approaches often lack generalizability when deployed across different languages and clinical settings, limiting their practical utility. In this study, we propose CogBench, the first benchmark designed to evaluate the cross-lingual and cross-site generalizability of large language models (LLMs) for speech-based cognitive impairment assessment. Using a unified multimodal pipeline, we evaluate model performance on three speech datasets spanning English and Mandarin: ADReSSo, NCMMSC2021-AD, and a newly collected test set, CIR-E. Our results show that conventional deep learning models degrade substantially when transferred across domains. In contrast, LLMs equipped with chain-of-thought prompting demonstrate better adaptability, though their performance remains sensitive to prompt design. Furthermore, we explore lightweight fine-tuning of LLMs via Low-Rank Adaptation (LoRA), which significantly improves generalization in target domains. These findings offer a critical step toward building clinically useful and linguistically robust speech-based cognitive assessment tools.


Interpretable Machine Learning for Cognitive Aging: Handling Missing Data and Uncovering Social Determinant

Mao, Xi, Wang, Zhendong, Li, Jingyu, Mao, Lingchao, Essien, Utibe, Wang, Hairong, Ni, Xuelei Sherry

arXiv.org Artificial Intelligence

Early detection of Alzheimer's disease (AD) is crucial because its neurodegenerative effects are irreversible, and neuropathologic and social-behavioral risk factors accumulate years before diagnosis. Identifying higher-risk individuals earlier enables prevention, timely care, and equitable resource allocation. We predict cognitive performance from social determinants of health (SDOH) using the NIH NIA-supported PREPARE Challenge Phase 2 dataset derived from the nationally representative Mex-Cog cohort of the 2003 and 2012 Mexican Health and Aging Study (MHAS). Data: The target is a validated composite cognitive score across seven domains-orientation, memory, attention, language, constructional praxis, and executive function-derived from the 2016 and 2021 MHAS waves. Predictors span demographic, socioeconomic, health, lifestyle, psychosocial, and healthcare access factors. Methodology: Missingness was addressed with a singular value decomposition (SVD)-based imputation pipeline treating continuous and categorical variables separately. This approach leverages latent feature correlations to recover missing values while balancing reliability and scalability. After evaluating multiple methods, XGBoost was chosen for its superior predictive performance. Results and Discussion: The framework outperformed existing methods and the data challenge leaderboard, demonstrating high accuracy, robustness, and interpretability. SHAP-based post hoc analysis identified top contributing SDOH factors and age-specific feature patterns. Notably, flooring material emerged as a strong predictor, reflecting socioeconomic and environmental disparities. Other influential factors, age, SES, lifestyle, social interaction, sleep, stress, and BMI, underscore the multifactorial nature of cognitive aging and the value of interpretable, data-driven SDOH modeling.


Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies

Taherinezhad, Fatemeh, Nezhad, Mohamad Javad Momeni, Karimi, Sepehr, Rashidi, Sina, Zolnour, Ali, Dadkhah, Maryam, Haghbin, Yasaman, AzadMaleki, Hossein, Zolnoori, Maryam

arXiv.org Artificial Intelligence

Over half of US adults with Alzheimer disease and related dementias remain undiagnosed, and speech-based screening offers a scalable detection approach. We compared large language model adaptation strategies for dementia detection using the DementiaBank speech corpus, evaluating nine text-only models and three multimodal audio-text models on recordings from DementiaBank speech corpus. Adaptations included in-context learning with different demonstration selection policies, reasoning-augmented prompting, parameter-efficient fine-tuning, and multimodal integration. Results showed that class-centroid demonstrations achieved the highest in-context learning performance, reasoning improved smaller models, and token-level fine-tuning generally produced the best scores. Adding a classification head substantially improved underperforming models. Among multimodal models, fine-tuned audio-text systems performed well but did not surpass the top text-only models. These findings highlight that model adaptation strategies, including demonstration selection, reasoning design, and tuning method, critically influence speech-based dementia detection, and that properly adapted open-weight models can match or exceed commercial systems.


Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models

Ng, Si-Ioi, Ambadi, Pranav S., Mueller, Kimberly D., Liss, Julie, Berisha, Visar

arXiv.org Artificial Intelligence

Current methods for automated assessment of cognitive-linguistic impairment via picture description often neglect the visual narrative path - the sequence and locations of elements a speaker described in the picture. Analyses of spatio-semantic features capture this path using content information units (CIUs), but manual tagging or dictionary-based mapping is labor-intensive. This study proposes a BERT-based pipeline, fine tuned with binary cross-entropy and pairwise ranking loss, for automated CIU extraction and ordering from the Cookie Theft picture description. Evaluated by 5-fold cross-validation, it achieves 93% median precision, 96% median recall in CIU detection, and 24% sequence error rates. The proposed method extracts features that exhibit strong Pearson correlations with ground truth, surpassing the dictionary-based baseline in external validation. These features also perform comparably to those derived from manual annotations in evaluating group differences via ANCOVA. The pipeline is shown to effectively characterize visual narrative paths for cognitive impairment assessment, with the implementation and models open-sourced to public.