Goto

Collaborating Authors

 atrophy


S-Chain: Structured Visual Chain-of-Thought For Medicine

Le-Duc, Khai, Nguyen, Duy M. H., Trinh, Phuong T. H., Nguyen, Tien-Phat, Diep, Nghiem T., Ngo, An, Vu, Tung, Vuong, Trinh, Nguyen, Anh-Tien, Nguyen, Mau, Hoang, Van Trung, Nguyen, Khai-Nguyen, Nguyen, Hy, Ngo, Chris, Liu, Anji, Ho, Nhat, Hauschild, Anne-Christin, Nguyen, Khanh Xuan, Nguyen-Tang, Thanh, Xie, Pengtao, Sonntag, Daniel, Zou, James, Niepert, Mathias, Nguyen, Anh Totti

arXiv.org Artificial Intelligence

Faithful reasoning in medical vision-language models (VLMs) requires not only accurate predictions but also transparent alignment between textual rationales and visual evidence. While Chain-of-Thought (CoT) prompting has shown promise in medical visual question answering (VQA), no large-scale expert-level dataset has captured stepwise reasoning with precise visual grounding. We introduce S-Chain, the first large-scale dataset of 12,000 expert-annotated medical images with bounding boxes and structured visual CoT (SV-CoT), explicitly linking visual regions to reasoning steps. The dataset further supports 16 languages, totaling over 700k VQA pairs for broad multilingual applicability. Using S-Chain, we benchmark state-of-the-art medical VLMs (ExGra-Med, LLaVA-Med) and general-purpose VLMs (Qwen2.5-VL, InternVL2.5), showing that SV-CoT supervision significantly improves interpretability, grounding fidelity, and robustness. Beyond benchmarking, we study its synergy with retrieval-augmented generation, revealing how domain knowledge and visual grounding interact during autoregressive reasoning. Finally, we propose a new mechanism that strengthens the alignment between visual evidence and reasoning, improving both reliability and efficiency. S-Chain establishes a new benchmark for grounded medical reasoning and paves the way toward more trustworthy and explainable medical VLMs.


How do our bodies remember?

MIT Technology Review

How do our bodies remember? The more we move, the more our muscle cells begin to make a memory of that exercise. Explains: Let our writers untangle the complex, messy world of technology to help you understand what's coming next. "Like riding a bike" is shorthand for the remarkable way that our bodies remember how to move. Most of the time when we talk about muscle memory, we're not talking about the muscles themselves but about the memory of a coordinated movement pattern that lives in the motor neurons, which control our muscles. Yet in recent years, scientists have discovered that have a memory for movement and exercise.


Validation of a CT-brain analysis tool for measuring global cortical atrophy in older patient cohorts

Bal, Sukhdeep, Colbourne, Emma, Gan, Jasmine, Griffanti, Ludovica, Hanayik, Taylor, Demeyere, Nele, Davies, Jim, Pendlebury, Sarah T, Jenkinson, Mark

arXiv.org Artificial Intelligence

Quantification of brain atrophy currently requires visual rating scales which are time consuming and automated brain image analysis is warranted. We validated our automated deep learning (DL) tool measuring the Global Cerebral Atrophy (GCA) score against trained human raters, and associations with age and cognitive impairment, in representative older (>65 years) patients. CT-brain scans were obtained from patients in acute medicine (ORCHARD-EPR), acute stroke (OCS studies) and a legacy sample. Scans were divided in a 60/20/20 ratio for training, optimisation and testing. CT-images were assessed by two trained raters (rater-1=864 scans, rater-2=20 scans). Agreement between DL tool-predicted GCA scores (range 0-39) and the visual ratings was evaluated using mean absolute error (MAE) and Cohen's weighted kappa. Among 864 scans (ORCHARD-EPR=578, OCS=200, legacy scans=86), MAE between the DL tool and rater-1 GCA scores was 3.2 overall, 3.1 for ORCHARD-EPR, 3.3 for OCS and 2.6 for the legacy scans and half had DL-predicted GCA error between -2 and 2. Inter-rater agreement was Kappa=0.45 between the DL-tool and rater-1, and 0.41 between the tool and rater- 2 whereas it was lower at 0.28 for rater-1 and rater-2. There was no difference in GCA scores from the DL-tool and the two raters (one-way ANOVA, p=0.35) or in mean GCA scores between the DL-tool and rater-1 (paired t-test, t=-0.43, p=0.66), the tool and rater-2 (t=1.35, p=0.18) or between rater-1 and rater-2 (t=0.99, p=0.32). DL-tool GCA scores correlated with age and cognitive scores (both p<0.001). Our DL CT-brain analysis tool measured GCA score accurately and without user input in real-world scans acquired from older patients. Our tool will enable extraction of standardised quantitative measures of atrophy at scale for use in health data research and will act as proof-of-concept towards a point-of-care clinically approved tool.


An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

Zamai, Andrew, Fijalkow, Nathanael, Mansencal, Boris, Simon, Laurent, Navet, Eloi, Coupe, Pierrick

arXiv.org Artificial Intelligence

The differential diagnosis of neurodegenerative dementias is a challenging clinical task, mainly because of the overlap in symptom presentation and the similarity of patterns observed in structural neuroimaging. To improve diagnostic efficiency and accuracy, deep learning-based methods such as Convolutional Neural Networks and Vision Transformers have been proposed for the automatic classification of brain MRIs. However, despite their strong predictive performance, these models find limited clinical utility due to their opaque decision making. In this work, we propose a framework that integrates two core components to enhance diagnostic transparency. First, we introduce a modular pipeline for converting 3D T1-weighted brain MRIs into textual radiology reports. Second, we explore the potential of modern Large Language Models (LLMs) to assist clinicians in the differential diagnosis between Frontotemporal dementia subtypes, Alzheimer's disease, and normal aging based on the generated reports. To bridge the gap between predictive accuracy and explainability, we employ reinforcement learning to incentivize diagnostic reasoning in LLMs. Without requiring supervised reasoning traces or distillation from larger models, our approach enables the emergence of structured diagnostic rationales grounded in neuroimaging findings. Unlike post-hoc explainability methods that retrospectively justify model decisions, our framework generates diagnostic rationales as part of the inference process-producing causally grounded explanations that inform and guide the model's decision-making process. In doing so, our framework matches the diagnostic performance of existing deep learning methods while offering rationales that support its diagnostic conclusions.


Revisiting the Role of Relearning in Semantic Dementia

Jarvis, Devon, Klar, Verena, Klein, Richard, Rosman, Benjamin, Saxe, Andrew

arXiv.org Artificial Intelligence

Patients with semantic dementia (SD) present with remarkably consistent atrophy of neurons in the anterior temporal lobe and behavioural impairments, such as graded loss of category knowledge. While relearning of lost knowledge has been shown in acute brain injuries such as stroke, it has not been widely supported in chronic cognitive diseases such as SD. Previous research has shown that deep linear artificial neural networks exhibit stages of semantic learning akin to humans. Here, we use a deep linear network to test the hypothesis that relearning during disease progression rather than particular atrophy cause the specific behavioural patterns associated with SD. After training the network to generate the common semantic features of various hierarchically organised objects, neurons are successively deleted to mimic atrophy while retraining the model. The model with relearning and deleted neurons reproduced errors specific to SD, including prototyping errors and cross-category confusions. This suggests that relearning is necessary for artificial neural networks to reproduce the behavioural patterns associated with SD in the absence of \textit{output} non-linearities. Our results support a theory of SD progression that results from continuous relearning of lost information. Future research should revisit the role of relearning as a contributing factor to cognitive diseases.


Unsupervised Analysis of Alzheimer's Disease Signatures using 3D Deformable Autoencoders

Avci, Mehmet Yigit, Chan, Emily, Zimmer, Veronika, Rueckert, Daniel, Wiestler, Benedikt, Schnabel, Julia A., Bercea, Cosmin I.

arXiv.org Artificial Intelligence

With the increasing incidence of neurodegenerative diseases such as Alzheimer's Disease (AD), there is a need for further research that enhances detection and monitoring of the diseases. We present MORPHADE (Morphological Autoencoders for Alzheimer's Disease Detection), a novel unsupervised learning approach which uses deformations to allow the analysis of 3D T1-weighted brain images. To the best of our knowledge, this is the first use of deformations with deep unsupervised learning to not only detect, but also localize and assess the severity of structural changes in the brain due to AD. We obtain markedly higher anomaly scores in clinically important areas of the brain in subjects with AD compared to healthy controls, showcasing that our method is able to effectively locate AD-related atrophy. We additionally observe a visual correlation between the severity of atrophy highlighted in our anomaly maps and medial temporal lobe atrophy scores evaluated by a clinical expert. Finally, our method achieves an AUROC of 0.80 in detecting AD, out-performing several supervised and unsupervised baselines. We believe our framework shows promise as a tool towards improved understanding, monitoring and detection of AD. To support further research and application, we have made our code publicly available at github.com/ci-ber/MORPHADE. Keywords: Unsupervised learning Registration Classification


Space Physiology and Technology: Musculoskeletal Adaptations, Countermeasures, and the Opportunity for Wearable Robotics

Khan, Shamas Ul Ebad, Varghese, Rejin John, Kassanos, Panagiotis, Farina, Dario, Burdet, Etienne

arXiv.org Artificial Intelligence

Space poses significant challenges for human physiology, leading to physiological adaptations in response to an environment vastly different from Earth. While these adaptations can be beneficial, they may not fully counteract the adverse impact of space-related stressors. A comprehensive understanding of these physiological adaptations is needed to devise effective countermeasures to support human life in space. This review focuses on the impact of the environment in space on the musculoskeletal system. It highlights the complex interplay between bone and muscle adaptation, the underlying physiological mechanisms, and their implications on astronaut health. Furthermore, the review delves into the deployed and current advances in countermeasures and proposes, as a perspective for future developments, wearable sensing and robotic technologies, such as exoskeletons, as a fitting alternative.


Deep-learning-based clustering of OCT images for biomarker discovery in age-related macular degeneration (Pinnacle study report 4)

Holland, Robbie, Kaye, Rebecca, Hagag, Ahmed M., Leingang, Oliver, Taylor, Thomas R. P., Bogunović, Hrvoje, Schmidt-Erfurth, Ursula, Scholl, Hendrik P. N., Rueckert, Daniel, Lotery, Andrew J., Sivaprasad, Sobha, Menten, Martin J.

arXiv.org Artificial Intelligence

Diseases are currently managed by grading systems, where patients are stratified by grading systems into stages that indicate patient risk and guide clinical management. However, these broad categories typically lack prognostic value, and proposals for new biomarkers are currently limited to anecdotal observations. In this work, we introduce a deep-learning-based biomarker proposal system for the purpose of accelerating biomarker discovery in age-related macular degeneration (AMD). It works by first training a neural network using self-supervised contrastive learning to discover, without any clinical annotations, features relating to both known and unknown AMD biomarkers present in 46,496 retinal optical coherence tomography (OCT) images. To interpret the discovered biomarkers, we partition the images into 30 subsets, termed clusters, that contain similar features. We then conduct two parallel 1.5-hour semi-structured interviews with two independent teams of retinal specialists that describe each cluster in clinical language. Overall, both teams independently identified clearly distinct characteristics in 27 of 30 clusters, of which 23 were related to AMD. Seven were recognised as known biomarkers already used in established grading systems and 16 depicted biomarker combinations or subtypes that are either not yet used in grading systems, were only recently proposed, or were unknown. Clusters separated incomplete from complete retinal atrophy, intraretinal from subretinal fluid and thick from thin choroids, and in simulation outperformed clinically-used grading systems in prognostic value. Overall, contrastive learning enabled the automatic proposal of AMD biomarkers that go beyond the set used by clinically established grading systems. Ultimately, we envision that equipping clinicians with discovery-oriented deep-learning tools can accelerate discovery of novel prognostic biomarkers.


Dimensional Neuroimaging Endophenotypes: Neurobiological Representations of Disease Heterogeneity Through Machine Learning

Wen, Junhao, Antoniades, Mathilde, Yang, Zhijian, Hwang, Gyujoon, Skampardoni, Ioanna, Wang, Rongguang, Davatzikos, Christos

arXiv.org Artificial Intelligence

Machine learning has been increasingly used to obtain individualized neuroimaging signatures for disease diagnosis, prognosis, and response to treatment in neuropsychiatric and neurodegenerative disorders. Therefore, it has contributed to a better understanding of disease heterogeneity by identifying disease subtypes that present significant differences in various brain phenotypic measures. In this review, we first present a systematic literature overview of studies using machine learning and multimodal MRI to unravel disease heterogeneity in various neuropsychiatric and neurodegenerative disorders, including Alzheimer disease, schizophrenia, major depressive disorder, autism spectrum disorder, multiple sclerosis, as well as their potential in transdiagnostic settings. Subsequently, we summarize relevant machine learning methodologies and discuss an emerging paradigm which we call dimensional neuroimaging endophenotype (DNE). DNE dissects the neurobiological heterogeneity of neuropsychiatric and neurodegenerative disorders into a low dimensional yet informative, quantitative brain phenotypic representation, serving as a robust intermediate phenotype (i.e., endophenotype) largely reflecting underlying genetics and etiology. Finally, we discuss the potential clinical implications of the current findings and envision future research avenues.


Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales

Kwon, Taeyoon, Ong, Kai Tzu-iunn, Kang, Dongjin, Moon, Seungjun, Lee, Jeong Ryong, Hwang, Dosik, Sim, Yongsik, Sohn, Beomseok, Lee, Dongha, Yeo, Jinyoung

arXiv.org Artificial Intelligence

Machine reasoning has made great progress in recent years owing to large language models (LLMs). In the clinical domain, however, most NLP-driven projects mainly focus on clinical classification or reading comprehension, and under-explore clinical reasoning for disease diagnosis due to the expensive rationale annotation with clinicians. In this work, we present a ``reasoning-aware'' diagnosis framework that rationalizes the diagnostic process via prompt-based learning in a time- and labor-efficient manner, and learns to reason over the prompt-generated rationales. Specifically, we address the clinical reasoning for disease diagnosis, where the LLM generates diagnostic rationales providing its insight on presented patient data and the reasoning path towards the diagnosis, namely Clinical Chain-of-Thought (Clinical CoT). We empirically demonstrate LLMs/LMs' ability of clinical reasoning via extensive experiments and analyses on both rationale generation and disease diagnosis in various settings. We further propose a novel set of criteria for evaluating machine-generated rationales' potential for real-world clinical settings, facilitating and benefiting future research in this area.