Goto

Collaborating Authors

 stratification


AutoLugano: A Deep Learning Framework for Fully Automated Lymphoma Segmentation and Lugano Staging on FDG-PET/CT

Pan, Boyang, Zhang, Zeyu, Meng, Hongyu, Cui, Bin, Zhang, Yingying, Hou, Wenli, Li, Junhao, Zhong, Langdi, Chen, Xiaoxiao, Xu, Xiaoyu, Zuo, Changjin, Cheng, Chao, Gong, Nan-Jie

arXiv.org Artificial Intelligence

Purpose: To develop a fully automated deep learning system, AutoLugano, for end-to-end lymphoma classification by performing lesion segmentation, anatomical localization, and automated Lugano staging from baseline FDG-PET/CT scans. Methods: The AutoLugano system processes baseline FDG-PET/CT scans through three sequential modules:(1) Anatomy-Informed Lesion Segmentation, a 3D nnU-Net model, trained on multi-channel inputs, performs automated lesion detection (2) Atlas-based Anatomical Localization, which leverages the TotalSegmentator toolkit to map segmented lesions to 21 predefined lymph node regions using deterministic anatomical rules; and (3) Automated Lugano Staging, where the spatial distribution of involved regions is translated into Lugano stages and therapeutic groups (Limited vs. Advanced Stage).The system was trained on the public autoPET dataset (n=1,007) and externally validated on an independent cohort of 67 patients. Performance was assessed using accuracy, sensitivity, specificity, F1-scorefor regional involvement detection and staging agreement. Results: On the external validation set, the proposed model demonstrated robust performance, achieving an overall accuracy of 88.31%, sensitivity of 74.47%, Specificity of 94.21% and an F1-score of 80.80% for regional involvement detection,outperforming baseline models. Most notably, for the critical clinical task of therapeutic stratification (Limited vs. Advanced Stage), the system achieved a high accuracy of 85.07%, with a specificity of 90.48% and a sensitivity of 82.61%.Conclusion: AutoLugano represents the first fully automated, end-to-end pipeline that translates a single baseline FDG-PET/CT scan into a complete Lugano stage. This study demonstrates its strong potential to assist in initial staging, treatment stratification, and supporting clinical decision-making.


Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography Outcomes

Xiao, Yujie, Tang, Gongzhen, Zhang, Deyun, Li, Jun, Nie, Guangkun, Wang, Haoyu, Huang, Shun, Liu, Tong, Zhao, Qinghao, Chen, Kangyin, Hong, Shenda

arXiv.org Artificial Intelligence

Coronary artery disease (CAD) remains a major global health burden. Accurate identification of the culprit vessel and assessment of stenosis severity are essential for guiding individualized therapy. Although coronary CT angiography (CCTA) is the first-line non-invasive modality for CAD diagnosis, its dependence on high-end equipment, radiation exposure, and strict patient cooperation limits large-scale use. With advances in artificial intelligence (AI) and the widespread availability of electrocardiography (ECG), AI-ECG offers a promising alternative for CAD screening. In this study, we developed an interpretable AI-ECG model to predict severe or complete stenosis of the four major coronary arteries on CCTA. On the internal validation set, the model's AUCs for the right coronary artery (RCA), left main coronary artery (LM), left anterior descending artery (LAD), and left circumflex artery (LCX) were 0.794, 0.818, 0.744, and 0.755, respectively; on the external validation set, the AUCs reached 0.749, 0.971, 0.667, and 0.727, respectively. Performance remained stable in a clinically normal-ECG subset, indicating robustness beyond overt ECG abnormalities. Subgroup analyses across demographic and acquisition-time strata further confirmed model stability. Risk stratification based on vessel-specific incidence thresholds showed consistent separation on calibration and cumulative event curves. Interpretability analyses revealed distinct waveform differences between high- and low-risk groups, highlighting key electrophysiological regions contributing to model decisions and offering new insights into the ECG correlates of coronary stenosis.


Masked Autoencoder Joint Learning for Robust Spitzoid Tumor Classification

Carretero, Ilán, Mahtani, Roshni, Perez-Deben, Silvia, González-Muñoz, José Francisco, Monteagudo, Carlos, Naranjo, Valery, del Amor, Rocío

arXiv.org Artificial Intelligence

Accurate diagnosis of spitzoid tumors (ST) is critical to ensure a favorable prognosis and to avoid both under- and over-treatment. Epigenetic data, particularly DNA methylation, provide a valuable source of information for this task. However, prior studies assume complete data, an unrealistic setting as methylation profiles frequently contain missing entries due to limited coverage and experimental artifacts. Our work challenges these favorable scenarios and introduces ReMAC, an extension of ReMasker designed to tackle classification tasks on high-dimensional data under complete and incomplete regimes. Evaluation on real clinical data demonstrates that ReMAC achieves strong and robust performance compared to competing classification methods in the stratification of ST. Code is available: https://github.com/roshni-mahtani/ReMAC.


Deep Pathomic Learning Defines Prognostic Subtypes and Molecular Drivers in Colorectal Cancer

Wang, Zisong, Wang, Xuanyu, Chen, Hang, Wang, Haizhou, Chen, Yuxin, Xu, Yihang, Yuan, Yunhe, Luo, Lihuan, Ling, Xitong, Liu, Xiaoping

arXiv.org Artificial Intelligence

Precise prognostic stratification of colorectal cancer (CRC) remains a major clinical challenge due to its high heterogeneity. The conventional TNM staging system is inadequate for personalized medicine. We aimed to develop and validate a novel multiple instance learning model TDAM-CRC using histopathological whole-slide images for accurate prognostic prediction and to uncover its underlying molecular mechanisms. We trained the model on the TCGA discovery cohort (n=581), validated it in an independent external cohort (n=1031), and further we integrated multi-omics data to improve model interpretability and identify novel prognostic biomarkers. The results demonstrated that the TDAM-CRC achieved robust risk stratification in both cohorts. Its predictive performance significantly outperformed the conventional clinical staging system and multiple state-of-the-art models. The TDAM-CRC risk score was confirmed as an independent prognostic factor in multivariable analysis. Multi-omics analysis revealed that the high-risk subtype is closely associated with metabolic reprogramming and an immunosuppressive tumor microenvironment. Through interaction network analysis, we identified and validated Mitochondrial Ribosomal Protein L37 (MRPL37) as a key hub gene linking deep pathomic features to clinical prognosis. We found that high expression of MRPL37, driven by promoter hypomethylation, serves as an independent biomarker of favorable prognosis. Finally, we constructed a nomogram incorporating the TDAM-CRC risk score and clinical factors to provide a precise and interpretable clinical decision-making tool for CRC patients. Our AI-driven pathological model TDAM-CRC provides a robust tool for improved CRC risk stratification, reveals new molecular targets, and facilitates personalized clinical decision-making.




We are glad that all reviewers appreciated the soundness of our work, the importance of the hidden stratification (HS)

Neural Information Processing Systems

ERM model to obtain a feature representation and then trains a second, robust model. With tuning of learning rate schedules and other hyperparameters (HPs), GEORGE's cost could be further reduced. D.4, we define "inherent hardness" as the minimum possible worst-case subclass We hope that building on this method may also be of independent interest. Our results are fairly insensitive (no significant performance drop) to reasonable variation in these HPs. Additional classification metrics (ISIC omitted for space).


7a674153c63cff1ad7f0e261c369ab2c-Supplemental.pdf

Neural Information Processing Systems

This is the appendix for "A mathematical model for automatic differentiation in machine learning". We propose to study backward mode of AD, as implemented for nonsmooth functions by standard software (e.g. Our theoretical results model AD as implemented in current machine learning libraries. The conclusion follows because f p y q f px q " For each i " 1 ...,m and j " 1,...,l, consider the set U We recall here the results of geometry that we use in the present work. The simplest o-minimal structure is given by the class of real semialgebraic objects. The following can be found for example in [21]. D p x q " tgrad f p xqu, (10) where grad f p x q is the gradient of f restricted to the active strata M Then the following are equivalent D is conservative for f .


Affordable EEG, Actionable Insights: An Open Dataset and Evaluation Framework for Epilepsy Patient Stratification

Tabib, HM Shadman, Adil, Md. Hasnaen, Rahman, Ayesha, Swapnil, Ahmmad Nur, Hasana, Maoyejatun, Chowdhury, Ahmed Hossain, Islam, A. B. M. Alim Al

arXiv.org Artificial Intelligence

Access to clinical multi-channel EEG remains limited in many regions worldwide. We present NEUROSKY-EPI, the first open dataset of single-channel, consumer-grade EEG for epilepsy, collected in a South Asian clinical setting along with rich contextual metadata. To explore its utility, we introduce EmbedCluster, a patient-stratification pipeline that transfers representations from EEGNet models trained on clinical data and enriches them with contextual autoencoder embeddings, followed by unsupervised clustering of patients based on EEG patterns. Results show that low-cost, single-channel data can support meaningful stratification. Beyond algorithmic performance, we emphasize human-centered concerns such as deployability in resource-constrained environments, interpretability for non-specialists, and safeguards for privacy, inclusivity, and bias. By releasing the dataset and code, we aim to catalyze interdisciplinary research across health technology, human-computer interaction, and machine learning, advancing the goal of affordable and actionable EEG-based epilepsy care.


Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework

Zhang, Yukun, Dong, Qi

arXiv.org Artificial Intelligence

We present Multi-Scale Manifold Alignment(MSMA), an information-geometric framework that decomposes LLM representations into local, intermediate, and global manifolds and learns cross-scale mappings that preserve geometry and information. Across GPT-2, BERT, RoBERTa, and T5, we observe consistent hierarchical patterns and find that MSMA improves alignment metrics under multiple estimators (e.g., relative KL reduction and MI gains with statistical significance across seeds). Controlled interventions at different scales yield distinct and architecture-dependent effects on lexical diversity, sentence structure, and discourse coherence. While our theoretical analysis relies on idealized assumptions, the empirical results suggest that multi-objective alignment offers a practical lens for analyzing cross-scale information flow and guiding representation-level control.