Goto

Collaborating Authors

 complication


Finger-prick diabetes blood test could be early warning for children

BBC News

All UK children could be offered screening for type 1 diabetes using a simple finger-prick blood test, say researchers who have been running a large study. Currently, many young people go undiagnosed and risk developing a life-threatening complication called diabetic ketoacidosis that needs urgent hospital treatment. Identifying diabetes earlier could help avoid this and mean treatments to control problematic blood sugar levels can be given sooner. Some 17,000 children aged three to 13 have already been checked as part of the ELSA (Early Surveillance for Autoimmune diabetes) study, funded by diabetes charities. Imogen, who is 12 and from the West Midlands, is one of those found to have diabetes thanks to the screening.


Medical Test-free Disease Detection Based on Big Data

Zhao, Haokun, Bai, Yingzhe, Xu, Qingyang, Zhou, Lixin, Chen, Jianxin, Fan, Jicong

arXiv.org Artificial Intelligence

Accurate disease detection is of paramount importance for effective medical treatment and patient care. However, the process of disease detection is often associated with extensive medical testing and considerable costs, making it impractical to perform all possible medical tests on a patient to diagnose or predict hundreds or thousands of diseases. In this work, we propose Collaborative Learning for Disease Detection (CLDD), a novel graph-based deep learning model that formulates disease detection as a collaborative learning task by exploiting associations among diseases and similarities among patients adaptively. CLDD integrates patient-disease interactions and demographic features from electronic health records to detect hundreds or thousands of diseases for every patient, with little to no reliance on the corresponding medical tests. Extensive experiments on a processed version of the MIMIC-IV dataset comprising 61,191 patients and 2,000 diseases demonstrate that CLDD consistently outperforms representative baselines across multiple metrics, achieving a 6.33\% improvement in recall and 7.63\% improvement in precision. Furthermore, case studies on individual patients illustrate that CLDD can successfully recover masked diseases within its top-ranked predictions, demonstrating both interpretability and reliability in disease prediction. By reducing diagnostic costs and improving accessibility, CLDD holds promise for large-scale disease screening and social health security.


Developing Fairness-Aware Task Decomposition to Improve Equity in Post-Spinal Fusion Complication Prediction

Yuan, Yining, Tamo, J. Ben, Shi, Wenqi, Zhong, Yishan, Nnamdi, Micky C., Brenn, B. Randall, Hwang, Steven W., Wang, May D.

arXiv.org Artificial Intelligence

Fairness in clinical prediction models remains a persistent challenge, particularly in high-stakes applications such as spinal fusion surgery for scoliosis, where patient outcomes exhibit substantial heterogeneity. Many existing fairness approaches rely on coarse demographic adjustments or post-hoc corrections, which fail to capture the latent structure of clinical populations and may unintentionally reinforce bias. We propose FAIR-MTL, a fairness-aware multitask learning framework designed to provide equitable and fine-grained prediction of postoperative complication severity. Instead of relying on explicit sensitive attributes during model training, FAIR-MTL employs a data-driven subgroup inference mechanism. We extract a compact demographic embedding, and apply k-means clustering to uncover latent patient subgroups that may be differentially affected by traditional models. These inferred subgroup labels determine task routing within a shared multitask architecture. During training, subgroup imbalance is mitigated through inverse-frequency weighting, and regularization prevents overfitting to smaller groups. Applied to postoperative complication prediction with four severity levels, FAIR-MTL achieves an AUC of 0.86 and an accuracy of 75%, outperforming single-task baselines while substantially reducing bias. For gender, the demographic parity difference decreases to 0.055 and equalized odds to 0.094; for age, these values reduce to 0.056 and 0.148, respectively. Model interpretability is ensured through SHAP and Gini importance analyses, which consistently highlight clinically meaningful predictors such as hemoglobin, hematocrit, and patient weight. Our findings show that incorporating unsupervised subgroup discovery into a multitask framework enables more equitable, interpretable, and clinically actionable predictions for surgical risk stratification.


Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation

Nasriddinov, Firdavs, Kocielnik, Rafal, Anandkumar, Anima, Hung, Andrew J.

arXiv.org Artificial Intelligence

High-quality intraoperative feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition. Automating natural, trainer-style feedback promises timely, accessible, and consistent guidance at scale but requires models that understand clinically relevant representations. We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts (33 surgeries) and uses it to condition feedback generation. We contribute by (1) mining Instrument-Action-Target (IAT) triplets from real-world feedback text and clustering surface forms into normalized categories, (2) fine-tuning a video-to-IAT model that leverages the surgical procedure and task contexts as well as fine-grained temporal instrument motion, and (3) demonstrating how to effectively use IAT triplet representations to guide GPT-4o in generating clinically grounded, trainer-style feedback. We show that, on Task 1: Video-to-IAT recognition, our context injection and temporal tracking deliver consistent AUC gains (Instrument: 0.67 to 0.74; Action: 0.60 to 0.63; Tissue: 0.74 to 0.79). For Task 2: feedback text generation (rated on a 1-5 fidelity rubric where 1 = opposite/unsafe, 3 = admissible, and 5 = perfect match to a human trainer), GPT-4o from video alone scores 2.17, while IAT conditioning reaches 2.44 (+12.4%), doubling the share of admissible generations with score >= 3 from 21% to 42%. Traditional text-similarity metrics also improve: word error rate decreases by 15-31% and ROUGE (phrase/substring overlap) increases by 9-64%. Grounding generation in explicit IAT structure improves fidelity and yields clinician-verifiable rationales, supporting auditable use in surgical training.


Reviewer # 1: 2

Neural Information Processing Systems

We thank the reviewers for the insightful reviews and valuable suggestions. We address the comments as follows. Provide proof of Lemma 1: The proof of Lemma 1 uses induction. We will add the proof to the supplementary. Y es, as defined in Section 2.1.


Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitus

Vershinina, Olga, Sabbatinelli, Jacopo, Bonfigli, Anna Rita, Colombaretti, Dalila, Giuliani, Angelica, Krivonosov, Mikhail, Trukhanov, Arseniy, Franceschi, Claudio, Ivanchenko, Mikhail, Olivieri, Fabiola

arXiv.org Artificial Intelligence

Objective. Type 2 diabetes mellitus (T2DM) is a highly prevalent non-communicable chronic disease that substantially reduces life expectancy. Accurate estimation of all-cause mortality risk in T2DM patients is crucial for personalizing and optimizing treatment strategies. Research Design and Methods. This study analyzed a cohort of 554 patients (aged 40-87 years) with diagnosed T2DM over a maximum follow-up period of 16.8 years, during which 202 patients (36%) died. Key survival-associated features were identified, and multiple machine learning (ML) models were trained and validated to predict all-cause mortality risk. To improve model interpretability, Shapley additive explanations (SHAP) was applied to the best-performing model. Results. The extra survival trees (EST) model, incorporating ten key features, demonstrated the best predictive performance. The model achieved a C-statistic of 0.776, with the area under the receiver operating characteristic curve (AUC) values of 0.86, 0.80, 0.841, and 0.826 for 5-, 10-, 15-, and 16.8-year all-cause mortality predictions, respectively. The SHAP approach was employed to interpret the model's individual decision-making processes. Conclusions. The developed model exhibited strong predictive performance for mortality risk assessment. Its clinically interpretable outputs enable potential bedside application, improving the identification of high-risk patients and supporting timely treatment optimization.


Evaluation and Implementation of Machine Learning Algorithms to Predict Early Detection of Kidney and Heart Disease in Diabetic Patients

Hasnain, Syed Ibad

arXiv.org Artificial Intelligence

Cardiovascular disease and chronic kidney disease are major complications of diabetes, leading to high morbidity and mortality. Early detection of these conditions is critical, yet traditional diagnostic markers often lack sensitivity in the initial stages. This study integrates conventional statistical methods with machine learning approaches to improve early diagnosis of CKD and CVD in diabetic patients. Descriptive and inferential statistics were computed in SPSS to explore associations between diseases and clinical or demographic factors. Patients were categorized into four groups: Group A both CKD and CVD, Group B CKD only, Group C CVD only, and Group D no disease. Statistical analysis revealed significant correlations: Serum Creatinine and Hypertension with CKD, and Cholesterol, Triglycerides, Myocardial Infarction, Stroke, and Hypertension with CVD. These results guided the selection of predictive features for machine learning models. Logistic Regression, Support Vector Machine, and Random Forest algorithms were implemented, with Random Forest showing the highest accuracy, particularly for CKD prediction. Ensemble models outperformed single classifiers in identifying high-risk diabetic patients. SPSS results further validated the significance of the key parameters integrated into the models. While challenges such as interpretability and class imbalance remain, this hybrid statistical machine learning framework offers a promising advancement toward early detection and risk stratification of diabetic complications compared to conventional diagnostic approaches.


KEEP: Integrating Medical Ontologies with Clinical Data for Robust Code Embeddings

Elhussein, Ahmed, Meddeb, Paul, Newbury, Abigail, Mirone, Jeanne, Stoll, Martin, Gursoy, Gamze

arXiv.org Artificial Intelligence

Machine learning in healthcare requires effective representation of structured medical codes, but current methods face a trade-off: knowledge graph-based approaches capture formal relationships but miss real-world patterns, while data-driven methods learn empirical associations but often overlook structured knowledge in medical terminologies. We present KEEP (Knowledge-preserving and Empirically refined Embedding Process), an efficient framework that bridges this gap by combining knowledge graph embeddings with adaptive learning from clinical data. KEEP first generates embeddings from knowledge graphs, then employs regularized training on patient records to adaptively integrate empirical patterns while preserving ontological relationships. Importantly, KEEP produces final embeddings without task-specific axillary or end-to-end training enabling KEEP to support multiple downstream applications and model architectures. Evaluations on structured EHR from UK Biobank and MIMIC-IV demonstrate that KEEP outperforms both traditional and Language Model-based approaches in capturing semantic relationships and predicting clinical outcomes. Moreover, KEEP's minimal computational requirements make it particularly suitable for resource-constrained environments. Data and Code Availability This research has been conducted using data from UK Biobank (Sud-low et al., 2015) and MIMIC-IV Johnson et al. (2021). Researchers can request access via https:// www.ukbiobank.ac.uk/ and https://physionet.


Author response: 'Stationary Activations for Uncertainty Calibration in Deep Learning ' NeurIPS: # 5154

Neural Information Processing Systems

We thank the anonymous reviewers for their enthusiasm and detailed comments on the manuscript. We start by addressing R2's concerns as they had the lowest score. This link is explicit as shown in this paper. The choice of kernel/activation function is up to the modelling task and'expert We merely provide a building block. The Matérn is a widely used prior, and worth adding to the NN tool set.


Causal Machine Learning for Surgical Interventions

Tamo, J. Ben, Chouhan, Nishant S., Nnamdi, Micky C., Yuan, Yining, Chivilkar, Shreya S., Shi, Wenqi, Hwang, Steven W., Brenn, B. Randall, Wang, May D.

arXiv.org Artificial Intelligence

Surgical decision-making is complex and requires understanding causal relationships between patient characteristics, interventions, and outcomes. In high-stakes settings like spinal fusion or scoliosis correction, accurate estimation of individualized treatment effects (ITEs) remains limited due to the reliance on traditional statistical methods that struggle with complex, heterogeneous data. In this study, we develop a multi-task meta-learning framework, X-MultiTask, for ITE estimation that models each surgical decision (e.g., anterior vs. posterior approach, surgery vs. no surgery) as a distinct task while learning shared representations across tasks. To strengthen causal validity, we incorporate the inverse probability weighting (IPW) into the training objective. We evaluate our approach on two datasets: (1) a public spinal fusion dataset (1,017 patients) to assess the effect of anterior vs. posterior approaches on complication severity; and (2) a private AIS dataset (368 patients) to analyze the impact of posterior spinal fusion (PSF) vs. non-surgical management on patient-reported outcomes (PROs). Our model achieves the highest average AUC (0.84) in the anterior group and maintains competitive performance in the posterior group (0.77). It outperforms baselines in treatment effect estimation with the lowest overall $ε_{\text{NN-PEHE}}$ (0.2778) and $ε_{\text{ATE}}$ (0.0763). Similarly, when predicting PROs in AIS, X-MultiTask consistently shows superior performance across all domains, with $ε_{\text{NN-PEHE}}$ = 0.2551 and $ε_{\text{ATE}}$ = 0.0902. By providing robust, patient-specific causal estimates, X-MultiTask offers a powerful tool to advance personalized surgical care and improve patient outcomes. The code is available at https://github.com/Wizaaard/X-MultiTask.