Diagnosis
FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method
Li, Yuheng, Gao, Jiechao, Han, Wei, Ouyang, Wenwen, Zhu, Wei, Leong, Hui Yi
Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to building clinical decision support systems. However, current MDT construction methods rely heavily on time-consuming and laborious manual annotation. To address this challenge, we propose PI-LoRA (Path-Integrated LoRA), a novel low-rank adaptation method for automatically extracting MDTs from clinical guidelines and textbooks. We integrate gradient path information to capture synergistic effects between different modules, enabling more effective and reliable rank allocation. This framework ensures that the most critical modules receive appropriate rank allocations while less important ones are pruned, resulting in a more efficient and accurate model for extracting medical decision trees from clinical texts. Extensive experiments on medical guideline datasets demonstrate that our PI-LoRA method significantly outperforms existing parameter-efficient fine-tuning approaches for the Text2MDT task, achieving better accuracy with substantially reduced model complexity. The proposed method achieves state-of-the-art results while maintaining a lightweight architecture, making it particularly suitable for clinical decision support systems where computational resources may be limited.
Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes
Santos, Pedro Cortes dos, Rocha, Matheus Becali, Krohling, Renato A
Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tackles such difficulties using the Tennessee Eastman Process, a well-established benchmark known for its intricate dynamics, to develop an innovative fault detection framework. Initial attempts with standard models revealed limitations in both performance and interpretability, prompting a shift toward a more tractable approach. By employing SHAP (SHapley Additive exPlanations), we transform the problem into a more manageable and transparent form, pinpointing the most critical process features driving fault predictions. This reduction in complexity unlocks the ability to apply causal analysis through Directed Acyclic Graphs, generated by multiple algorithms, to uncover the underlying mechanisms of fault propagation. The resulting causal structures align strikingly with SHAP findings, consistently highlighting key process elements-like cooling and separation systems-as pivotal to fault development. Together, these methods not only enhance detection accuracy but also provide operators with clear, actionable insights into fault origins, a synergy that, to our knowledge, has not been previously explored in this context. This dual approach bridges predictive power with causal understanding, offering a robust tool for monitoring complex manufacturing environments and paving the way for smarter, more interpretable fault detection in industrial systems.
Evolving Diagnostic Agents in a Virtual Clinical Environment
Qiu, Pengcheng, Wu, Chaoyi, Liu, Junwei, Zheng, Qiaoyu, Liao, Yusheng, Wang, Haowen, Yue, Yun, Fan, Qianrui, Zhen, Shuai, Wang, Jian, Gu, Jinjie, Wang, Yanfeng, Zhang, Ya, Xie, Weidi
In this paper, we present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning, enabling them to manage multi-turn diagnostic processes, adaptively select examinations, and commit to final diagnoses. Unlike instruction-tuned models trained on static case summaries, our method acquires diagnostic strategies through interactive exploration and outcome-based feedback. Our contributions are fourfold: (i) We present DiagGym, a diagnostics world model trained with electronic health records that emits examination outcomes conditioned on patient history and recommended examination, serving as a virtual clinical environment for realistic diagnosis training and evaluation; (ii) We train DiagAgent via end-to-end, multi-turn reinforcement learning to learn diagnostic policies that optimize both information yield and diagnostic accuracy; (iii) We introduce DiagBench, a diagnostic benchmark comprising 750 cases with physician-validated examination recommendations and 99 cases annotated with 973 physician-written rubrics on diagnosis process; (iv) we demonstrate superior performance across diverse diagnostic settings. DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o, as well as two prompt-engineered agents. In single-turn settings, DiagAgent achieves 9.34% higher diagnostic accuracy and 44.03% improvement in examination recommendation hit ratio. In end-to-end settings, it delivers 15.12% increase in diagnostic accuracy and 23.09% boost in examination recommendation F1 score. In rubric-based evaluation, it surpasses the next-best model, Claude-sonnet-4, by 7.1% in weighted rubric score. These findings indicate that learning policies in interactive clinical environments confers dynamic and clinically meaningful diagnostic management abilities unattainable through passive training alone.
Towards Error-Centric Intelligence II: Energy-Structured Causal Models
Contemporary machine learning optimizes for predictive accuracy, yet systems that achieve state of the art performance remain causally opaque: their internal representations provide no principled handle for intervention. We can retrain such models, but we cannot surgically edit specific mechanisms while holding others fixed, because learned latent variables lack causal semantics. We argue for a conceptual reorientation: intelligence is the ability to build and refine explanations, falsifiable claims about manipulable structure that specify what changes and what remains invariant under intervention. Explanations subsume prediction but demand more: causal commitments that can be independently tested and corrected at the level of mechanisms. We introduce computational explanations, mappings from observations to intervention ready causal accounts. We instantiate these explanations with Energy Structured Causal Models (ESCMs), in which mechanisms are expressed as constraints (energy functions or vector fields) rather than explicit input output maps, and interventions act by local surgery on those constraints. This shift makes internal structure manipulable at the level where explanations live: which relations must hold, which can change, and what follows when they do. We provide concrete instantiations of the structural-causal principles LAP and ICM in the ESCM context, and also argue that empirical risk minimization systematically produces fractured, entangled representations, a failure we analyze as gauge ambiguity in encoder energy pairs. Finally, we show that under mild conditions, ESCMs recover standard SCM semantics. Building on Part I's principles (LAP, ICM, CAP) and its definition of intelligence as explanation-building under criticism, this paper offers a formal language for causal reasoning in systems that aspire to understand, not merely to predict.
Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing
Fei, Xiang, Tian, Tina, Choset, Howie, Li, Lu
Loop closure is critical in Simultaneous Localization and Mapping (SLAM) systems to reduce accumulative drift and ensure global mapping consistency. However, conventional methods struggle in perceptually aliased environments, such as narrow pipes, due to vector quantization, feature sparsity, and repetitive textures, while existing solutions often incur high computational costs. This paper presents Bag-of-Word-Groups (BoWG), a novel loop closure detection method that achieves superior precision-recall, robustness, and computational efficiency. The core innovation lies in the introduction of word groups, which captures the spatial co-occurrence and proximity of visual words to construct an online dictionary. Additionally, drawing inspiration from probabilistic transition models, we incorporate temporal consistency directly into similarity computation with an adaptive scheme, substantially improving precision-recall performance. The method is further strengthened by a feature distribution analysis module and dedicated post-verification mechanisms. To evaluate the effectiveness of our method, we conduct experiments on both public datasets and a confined-pipe dataset we constructed. Results demonstrate that BoWG surpasses state-of-the-art methods, including both traditional and learning-based approaches, in terms of precision-recall and computational efficiency. Our approach also exhibits excellent scalability, achieving an average processing time of 16 ms per image across 17,565 images in the Bicocca25b dataset.
Foundational theory for optimal decision tree problems. II. Optimal hypersurface decision tree algorithm
Decision trees are a ubiquitous model for classification and regression tasks due to their interpretability and efficiency. However, solving the optimal decision tree (ODT) problem remains a challenging combinatorial optimization task. Even for the simplest splitting rules--axis-parallel hyperplanes--it is NP-hard to optimize. In Part I of this series, we rigorously defined the proper decision tree model through four axioms and, based on these, introduced four formal definitions of the ODT problem. From these definitions, we derived four generic algorithms capable of solving ODT problems for arbitrary decision trees satisfying the axioms. We also analyzed the combinatorial geometric properties of hypersurfaces, showing that decision trees defined by polynomial hypersurface splitting rules satisfy the proper axioms that we proposed. In this second paper (Part II) of this two-part series, building on the algorithmic and geometric foundations established in Part I, we introduce the first hypersurface decision tree (HODT) algorithm. To the best of our knowledge, existing optimal decision tree methods are, to date, limited to hyperplane splitting rules--a special case of hypersurfaces--and rely on general-purpose solvers. In contrast, our HODT algorithm addresses the general hypersurface decision tree model without requiring external solvers. Using synthetic datasets generated from ground-truth hyperplane decision trees, we vary tree size, data size, dimensionality, and label and feature noise. Results showing that our algorithm recovers the ground truth more accurately than axis-parallel trees and exhibits greater robustness to noise. We also analyzed generalization performance across 30 real-world datasets, showing that HODT can achieve up to 30% higher accuracy than the state-of-the-art optimal axis-parallel decision tree algorithm when tree complexity is properly controlled.
CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation
Lou, Jinhui, Yang, Yan, Yu, Zhou, Fu, Zhenqi, Han, Weidong, Huang, Qingming, Yu, Jun
Abstract--Chest X-ray (CXR) plays a pivotal role in clinical diagnosis, and a variety of task-specific and foundation models have been developed for automatic CXR interpretation. However, these models often struggle to adapt to new diagnostic tasks and complex reasoning scenarios. Recently, LLM-based agent models have emerged as a promising paradigm for CXR analysis, enhancing model's capability through tool coordination, multi-step reasoning, and team collaboration, etc. However, existing agents often rely on a single diagnostic pipeline and lack mechanisms for assessing tools' reliability, limiting their adaptability and credibility. T o this end, we propose CXRAgent, a director-orchestrated, multi-stage agent for CXR interpretation, where a central director coordinates the following stages: (1) T ool Invocation: The agent strategically orchestrates a set of CXR-analysis tools, with outputs normalized and verified by the Evidence-driven V alidator (EDV), which grounds diagnostic outputs with visual evidence to support reliable downstream diagnosis; (2) Diagnostic Planning: Guided by task requirements and intermediate findings, the agent formulates a targeted diagnostic plan. It then assembles an expert team accordingly, defining member roles and coordinating their interactions to enable adaptive and collaborative reasoning; (3) Collaborative Decision-making: The agent integrates insights from the expert team with accumulated contextual memories, synthesizing them into an evidence-backed diagnostic conclusion. Experiments on various CXR interpretation tasks show that CXRAgent delivers strong performance, providing visual evidence and generalizes well to clinical tasks of different complexity. Code and data are valuable at this link. HEST X-ray (CXR) is among the most widely used imaging modalities in clinical practice due to their affordability, rapid acquisition, and diagnostic utility across a wide range of thoracic conditions. This work has been submitted to the IEEE for possible publication.
PSO-XAI: A PSO-Enhanced Explainable AI Framework for Reliable Breast Cancer Detection
Raquib, Mirza, Das, Niloy, Prity, Farida Siddiqi, Fahim, Arafath Al, Murad, Saydul Akbar, Hossain, Mohammad Amzad, Hoque, MD Jiabul, Moni, Mohammad Ali
Breast cancer is considered the most critical and frequently diagnosed cancer in women worldwide, leading to an increase in cancer-related mortality. Early and accurate detection is crucial as it can help mitigate possible threats while improving survival rates. In terms of prediction, conventional diagnostic methods are often limited by variability, cost, and, most importantly, risk of misdiagnosis. To address these challenges, machine learning (ML) has emerged as a powerful tool for computer-aided diagnosis, with feature selection playing a vital role in improving model performance and interpretability. This research study proposes an integrated framework that incorporates customized Particle Swarm Optimization (PSO) for feature selection. This framework has been evaluated on a comprehensive set of 29 different models, spanning classical classifiers, ensemble techniques, neural networks, probabilistic algorithms, and instance-based algorithms. To ensure interpretability and clinical relevance, the study uses cross-validation in conjunction with explainable AI methods. Experimental evaluation showed that the proposed approach achieved a superior score of 99.1\% across all performance metrics, including accuracy and precision, while effectively reducing dimensionality and providing transparent, model-agnostic explanations. The results highlight the potential of combining swarm intelligence with explainable ML for robust, trustworthy, and clinically meaningful breast cancer diagnosis.
Natural Language Processing for Cardiology: A Narrative Review
Yang, Kailai, Leng, Yan, Zhang, Xin, Zhang, Tianlin, Thompson, Paul, Keavney, Bernard, Tomaszewski, Maciej, Ananiadou, Sophia
Cardiovascular diseases are becoming increasingly prevalent in modern society, with a profound impact on global health and well-being. These Cardiovascular disorders are complex and multifactorial, influenced by genetic predispositions, lifestyle choices, and diverse socioeconomic and clinical factors. Information about these interrelated factors is dispersed across multiple types of textual data, including patient narratives, medical records, and scientific literature. Natural language processing (NLP) has emerged as a powerful approach for analysing such unstructured data, enabling healthcare professionals and researchers to gain deeper insights that may transform the diagnosis, treatment, and prevention of cardiac disorders. This review provides a comprehensive overview of NLP research in cardiology from 2014 to 2025. We systematically searched six literature databases for studies describing NLP applications across a range of cardiovascular diseases. After a rigorous screening process, we identified 265 relevant articles. Each study was analysed across multiple dimensions, including NLP paradigms, cardiology-related tasks, disease types, and data sources. Our findings reveal substantial diversity within these dimensions, reflecting the breadth and evolution of NLP research in cardiology. A temporal analysis further highlights methodological trends, showing a progression from rule-based systems to large language models. Finally, we discuss key challenges and future directions, such as developing interpretable LLMs and integrating multimodal data. To the best of our knowledge, this review represents the most comprehensive synthesis of NLP research in cardiology to date.
Benchmarking noisy label detection methods
Pickler, Henrique, Kamassury, Jorge K. S., Silva, Danilo
Label noise is a common problem in real-world datasets, affecting both model training and validation. Clean data are essential for achieving strong performance and ensuring reliable evaluation. While various techniques have been proposed to detect noisy labels, there is no clear consensus on optimal approaches. We perform a comprehensive benchmark of detection methods by decomposing them into three fundamental components: label agreement function, aggregation method, and information gathering approach (in-sample vs out-of-sample). This decomposition can be applied to many existing detection methods, and enables systematic comparison across diverse approaches. To fairly compare methods, we propose a unified benchmark task, detecting a fraction of training samples equal to the dataset's noise rate. We also introduce a novel metric: the false negative rate at this fixed operating point. We identify that in-sample information gathering using average probability aggregation combined with the logit margin as the label agreement function achieves the best results across most scenarios. Our findings provide practical guidance for designing new detection methods and selecting techniques for specific applications. Keywords: Noisy label detection, Noisy labels, Dataset cleaning, Data quality, Benchmark, Neural networks 1. Introduction Most supervised learning methods assume a perfectly labeled dataset. However, training data often contain incorrectly labeled instances. Even large, standard benchmark datasets, such as CIFAR, ImageNet, and MS-COCO, are known to have noisy labels [1, 2].