AITopics | Diagnosis

Collaborating Authors

Diagnosis

News Overviews Instructional Materials AI-Alerts Classics

Sim4Seg: Boosting Multimodal Multi-disease Medical Diagnosis Segmentation with Region-Aware Vision-Language Similarity Masks

Song, Lingran, Zhou, Yucheng, Shen, Jianbing

arXiv.org Artificial IntelligenceNov-11-2025

Despite significant progress in pixel-level medical image analysis, existing medical image segmentation models rarely explore medical segmentation and diagnosis tasks jointly. However, it is crucial for patients that models can provide explainable diagnoses along with medical segmentation results. In this paper, we introduce a medical vision-language task named Medical Diagnosis Segmentation (MDS), which aims to understand clinical queries for medical images and generate the corresponding segmentation masks as well as diagnostic results. To facilitate this task, we first present the Multimodal Multi-disease Medical Diagnosis Segmentation (M3DS) dataset, containing diverse multimodal multi-disease medical images paired with their corresponding segmentation masks and diagnosis chain-of-thought, created via an automated diagnosis chain-of-thought generation pipeline. Moreover, we propose Sim4Seg, a novel framework that improves the performance of diagnosis segmentation by taking advantage of the Region-Aware Vision-Language Similarity to Mask (RVLS2M) module. To improve overall performance, we investigate a test-time scaling strategy for MDS tasks. Experimental results demonstrate that our method outperforms the baselines in both segmentation and diagnosis.

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2511.06665

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.34)

Add feedback

Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation

Gong, Changqing, Qin, Huafeng, El-Yacoubi, Mounim A.

arXiv.org Artificial IntelligenceNov-11-2025

Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.05841

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Add feedback

Learning to reason about rare diseases through retrieval-augmented agents

Kim, Ha Young, Li, Jun, Solana, Ana Beatriz, Pirkl, Carolin M., Wiestler, Benedikt, Schnabel, Julia A., Bercea, Cosmin I.

arXiv.org Artificial IntelligenceNov-10-2025

Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consult case reports and literature when confronted with unfamiliar findings. Following this line of reasoning, we introduce RADAR, Retrieval Augmented Diagnostic Reasoning Agents, an agentic system for rare disease detection in brain MRI. Our approach uses AI agents with access to external medical knowledge by embedding both case reports and literature using sentence transformers and indexing them with FAISS to enable efficient similarity search. The agent retrieves clinically relevant evidence to guide diagnostic decision making on unseen diseases, without the need of additional training. Designed as a model-agnostic reasoning module, RADAR can be seamlessly integrated with diverse large language models, consistently improving their rare pathology recognition and interpretability. On the NOVA dataset comprising 280 distinct rare diseases, RADAR achieves up to a 10.2% performance gain, with the strongest improvements observed for open source models such as DeepSeek. Beyond accuracy, the retrieved examples provide interpretable, literature grounded explanations, highlighting retrieval-augmented reasoning as a powerful paradigm for low-prevalence conditions in medical imaging.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.0472

Country:

Europe > Germany (0.18)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.95)
(2 more...)

Add feedback

Cross-modal Causal Intervention for Alzheimer's Disease Prediction

Jin, Yutao, Xiao, Haowen, Zhai, Junyong, Li, Yuxiao, Chu, Jielei, Lv, Fengmao, Li, Yuxiao

arXiv.org Artificial IntelligenceNov-7-2025

Mild Cognitive Impairment (MCI) serves as a prodromal stage of Alzheimer's Disease (AD), where early identification and intervention can effectively slow the progression to dementia. However, diagnosing AD remains a significant challenge in neurology due to the confounders caused mainly by the selection bias of multi-modal data and the complex relationships between variables. To address these issues, we propose a novel visual-language causality-inspired framework named Cross-modal Causal Intervention with Mediator for Alzheimer's Disease Diagnosis (MediAD) for diagnostic assistance. Our MediAD employs Large Language Models (LLMs) to summarize clinical data under strict templates, therefore enriching textual inputs. The MediAD model utilizes Magnetic Resonance Imaging (MRI), clinical data, and textual data enriched by LLMs to classify participants into Cognitively Normal (CN), MCI, and AD categories. Because of the presence of confounders, such as cerebral vascular lesions and age-related biomarkers, non-causal models are likely to capture spurious input-output correlations, generating less reliable results. Our framework implicitly mitigates the effect of both observable and unobservable confounders through a unified causal intervention method. Experimental results demonstrate the outstanding performance of our method in distinguishing CN/MCI/AD cases, outperforming other methods in most evaluation metrics. The study showcases the potential of integrating causal reasoning with multi-modal learning for neurological disease diagnosis.

confounder, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.13956

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (0.87)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Scalable Meta-Learning of near-optimal Interpretable Models via Synthetic Model Generations

Myint, Kyaw Hpone, Wu, Zhe, Day, Alexandre G. R., Iyengar, Giri

arXiv.org Machine LearningNov-7-2025

Decision trees are widely used in high-stakes fields like finance and healthcare due to their interpretability. This work introduces an efficient, scalable method for generating synthetic pre-training data to enable meta-learning of decision trees. Our approach samples near-optimal decision trees synthetically, creating large-scale, realistic datasets. Using the MetaTree transformer architecture, we demonstrate that this method achieves performance comparable to pre-training on real-world data or with computationally expensive optimal decision trees. This strategy significantly reduces computational costs, enhances data generation flexibility, and paves the way for scalable and efficient meta-learning of interpretable decision tree models.

artificial intelligence, decision tree learning, machine learning, (17 more...)

arXiv.org Machine Learning

2511.04

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics

Buchholz, Markus, Carlucho, Ignacio, Petillot, Yvan R.

arXiv.org Artificial IntelligenceNov-6-2025

The safe deployment of autonomous systems in safety-critical settings requires a paradigm that combines human expertise with AI-driven analysis, especially when anomalies are unforeseen. We introduce AURA (Autonomous Resilience Agent), a collaborative framework for anomaly and fault diagnostics in robotics. AURA integrates large language models (LLMs), a high-fidelity digital twin (DT), and human-in-the-loop interaction to detect and respond to anomalous behavior in real time. The architecture uses two agents with clear roles: (i) a low-level State Anomaly Characterization Agent that monitors telemetry and converts signals into a structured natural-language problem description, and (ii) a high-level Diagnostic Reasoning Agent that conducts a knowledge-grounded dialogue with an operator to identify root causes, drawing on external sources. Human-validated diagnoses are then converted into new training examples that refine the low-level perceptual model. This feedback loop progressively distills expert knowledge into the AI, transforming it from a static tool into an adaptive partner. We describe the framework's operating principles and provide a concrete implementation, establishing a pattern for trustworthy, continually improving human-robot teams.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.03075

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

UniFault: A Fault Diagnosis Foundation Model from Bearing Data

Eldele, Emadeldeen, Ragab, Mohamed, Qing, Xu, Edward, null, Chen, Zhenghua, Wu, Min, Li, Xiaoli, Lee, Jay

arXiv.org Artificial IntelligenceNov-6-2025

Machine fault diagnosis (FD) is a critical task for predictive maintenance, enabling early fault detection and preventing unexpected failures. Despite its importance, existing FD models are operation-specific with limited generalization across diverse datasets. Foundation models (FM) have demonstrated remarkable potential in both visual and language domains, achieving impressive generalization capabilities even with minimal data through few-shot or zero-shot learning. However, translating these advances to FD presents unique hurdles. Unlike the large-scale, cohesive datasets available for images and text, FD datasets are typically smaller and more heterogeneous, with significant variations in sampling frequencies and the number of channels across different systems and applications. This heterogeneity complicates the design of a universal architecture capable of effectively processing such diverse data while maintaining robust feature extraction and learning capabilities. In this paper, we introduce UniFault, a foundation model for fault diagnosis that systematically addresses these issues. Specifically, the model incorporates a comprehensive data harmonization pipeline featuring two key innovations. First, a unification scheme transforms multivariate inputs into standardized univariate sequences. Second, a novel cross-domain temporal fusion strategy mitigates distribution shifts and enriches sample diversity and count, improving the model generalization across varying conditions. UniFault is pretrained on over 6.9 million samples spanning diverse FD datasets, enabling superior few-shot performance. Extensive experiments on real-world FD datasets demonstrate that UniFault achieves state-of-the-art performance, setting a new benchmark for fault diagnosis models and paving the way for more scalable and robust predictive maintenance solutions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.01373

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Approaching Low-Cost Cardiac Intelligence with Semi-Supervised Knowledge Distillation

Zhou, Rushuang, Zhang, Yuan-Ting, Deen, M. Jamal, Dong, Yining

arXiv.org Artificial IntelligenceNov-6-2025

Deploying advanced cardiac artificial intelligence for daily cardiac monitoring is hindered by its reliance on extensive medical data and high computational resources. Low-cost cardiac intelligence (LCCI) offers a promising alternative by using wearable device data, such as 1-lead electrocardiogram (ECG), but it suffers from a significant diagnostic performance gap compared to high-cost cardiac intelligence (HCCI). To bridge this gap, we propose LiteHeart, a semi-supervised knowledge distillation framework. LiteHeart introduces a region-aware distillation module to mimic how cardiologists focus on diagnostically relevant ECG regions and a cross-layer mutual information module to align the decision processes of LCCI and HCCI systems. Using a semi-supervised training strategy, LiteHeart further improves model robustness under limited supervision. Evaluated on five datasets covering over 38 cardiovascular diseases, LiteHeart substantially reduces the performance gap between LCCI and HCCI, outperforming existing methods by 4.27% to 7.10% in macro F1 score. These results demonstrate that LiteHeart significantly enhances the diagnostic capabilities of low-cost cardiac intelligence systems, paving the way for scalable, affordable, and accurate daily cardiac healthcare using wearable technologies.

artificial intelligence, cardiac intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.02851

Country:

North America (0.67)
Asia > China > Zhejiang Province (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.68)

Add feedback

PRISM2: Unlocking Multi-Modal General Pathology AI with Clinical Dialogue

Vorontsov, Eugene, Shaikovski, George, Casson, Adam, Viret, Julian, Zimmermann, Eric, Tenenholtz, Neil, Wang, Yi Kan, Bernhard, Jan H., Godrich, Ran A., Retamero, Juan A., Shia, Jinru, Gonen, Mithat, Weiser, Martin R., Klimstra, David S., Yousfi, Razik, Fusi, Nicolo, Fuchs, Thomas J., Severson, Kristen, Liu, Siqi

arXiv.org Artificial IntelligenceNov-4-2025

Recent rapid progress in the field of computational pathology has been enabled by foundation models. These models are beginning to move beyond encoding image patches towards whole-slide understanding but their clinical utility remains limited. In this work, we present PRISM2, a multimodal slide-level foundation model trained on data from 700,000 diagnostic specimen-report pairs, the largest vision (2.3 million whole slide images) and language (14M question-answer pairs) histopathology dataset to date. By learning through clinical-dialogue supervision, PRISM2 aligns histomorphologic features with the language of diagnostic reasoning, producing slide-level representations that support both direct diagnostic question-answering and transferable embeddings for downstream tasks. Without additional training, PRISM2 matches or exceeds the cancer-detection performance of clinical-grade products. This is observed without loss of generality on other tasks, where PRISM2 achieves top performance. Finally, using survival prediction as the example, we show that task-specific finetuning with a large dataset can outperform task-specific models, further improving performance. These results demonstrate how language-supervised pretraining provides a scalable, clinically grounded signal for learning generalizable pathology representations, bridging human diagnostic reasoning and foundation-model performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.13063

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.88)
(2 more...)

Add feedback

A tutorial on discovering and quantifying the effect of latent causal sources of multimodal EHR data

Barbero-Mota, Marco, Strobl, Eric V., Still, John M., Stead, William W., Lasko, Thomas A.

arXiv.org Artificial IntelligenceNov-3-2025

We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.

artificial intelligence, latent causal source, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.16026

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.71)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.36)

Add feedback