Goto

Collaborating Authors

 survival


fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

Korkmaz, Selcuk, Goksuluk, Dincer, Karaismailoglu, Eda

arXiv.org Machine Learning

Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.


Nonparametric Regression Discontinuity Designs with Survival Outcomes

Schuessler, Maximilian, Sverdrup, Erik, Tibshirani, Robert, Wager, Stefan

arXiv.org Machine Learning

Quasi-experimental evaluations are central for generating real-world causal evidence and complementing insights from randomized trials. The regression discontinuity design (RDD) is a quasi-experimental design that can be used to estimate the causal effect of treatments that are assigned based on a running variable crossing a threshold. Such threshold-based rules are ubiquitous in healthcare, where predictive and prognostic biomarkers frequently guide treatment decisions. However, standard RD estimators rely on complete outcome data, an assumption often violated in time-to-event analyses where censoring arises from loss to follow-up. To address this issue, we propose a nonparametric approach that leverages doubly robust censoring corrections and can be paired with existing RD estimators. Our approach can handle multiple survival endpoints, long follow-up times, and covariate-dependent variation in survival and censoring. We discuss the relevance of our approach across multiple areas of applications and demonstrate its usefulness through simulations and the prostate component of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial where our new approach offers several advantages, including higher efficiency and robustness to misspecification. We have also developed an open-source software package, $\texttt{rdsurvival}$, for the $\texttt{R}$ language.




'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games

The Guardian

'Minecraft was my way in' The Traitors 2026 finalist Jade. 'Minecraft was my way in' The Traitors 2026 finalist Jade. 'I didn't have anything to prove': what Traitors finalist Jade Scott learned about survival from video games T he latest series of The Traitors, which ended last week on a nail-biting finale, featured some of the usual characters - from guileless extroverts to wannabe Columbos endlessly observing fellow contestants for the slightest flicker of treachery. But one faithful stood out for her quiet determination, despite a ceaseless onslaught of suspicion and accusation. That person was Jade Scott, and I wasn't at all surprised when, quite early on in the series, she revealed she was a keen gamer.


Female mice often have multiple sexual partners--for survival

Popular Science

Birthing a litter with several fathers may help when food is scarce. Breakthroughs, discoveries, and DIY tips sent six days a week. If a female house mouse mates with multiple male house mice, her litter could have multiple fathers. Polyandry, as this mating practice is called, is common for various species. Yet scientists are still investigating its purpose and the potential benefits of birthing half siblings within the same litter.


Why humans live and die for love

Popular Science

A new book explores how humans evolved to be wired for intimacy. It can save our lives. Intimate relationships provide stability, safety, and reassurance, especially when we are in pain. Breakthroughs, discoveries, and DIY tips sent every weekday. Adapted from THE INTIMATE ANIMAL by Justin Garcia, PhD. Used with permission of Little, Brown Spark, an imprint of Little, Brown and Company. Jen and Dave's second child was born in November 2002. Two weeks later, on a cold Thursday night, the phone rang.


Why some animals eat their babies

Popular Science

Animal filial cannibalism has been documented in fish, insects, even domestic pets. Scientists still don't fully understand why some animals eat their own offspring. Breakthroughs, discoveries, and DIY tips sent every weekday. "In general, cannibalism of offspring is super widespread," says Aneesh Bose, a behavioral ecologist at the Swedish University of Agricultural Sciences in Uppsala, Sweden. Bose has long studied the phenomenon of animals who turn from child-rearing to child-eating, and in 2022, he authored a review of prior research on the topic .


Associating Healthcare Teamwork with Patient Outcomes for Predictive Analysis

Lu, Hsiao-Ying, Ma, Kwan-Liu

arXiv.org Artificial Intelligence

Cancer treatment outcomes are influenced not only by clinical and demographic factors but also by the collaboration of healthcare teams. However, prior work has largely overlooked the potential role of human collaboration in shaping patient survival. This paper presents an applied AI approach to uncovering the impact of healthcare professionals' (HCPs) collaboration--captured through electronic health record (EHR) systems--on cancer patient outcomes. We model EHR-mediated HCP interactions as networks and apply machine learning techniques to detect predictive signals of patient survival embedded in these collaborations. Our models are cross validated to ensure generalizability, and we explain the predictions by identifying key network traits associated with improved outcomes. Importantly, clinical experts and literature validate the relevance of the identified crucial collaboration traits, reinforcing their potential for real-world applications. This work contributes to a practical workflow for leveraging digital traces of collaboration and AI to assess and improve team-based healthcare. The approach is potentially transferable to other domains involving complex collaboration and offers actionable insights to support data-informed interventions in healthcare delivery.


Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification

Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur

arXiv.org Machine Learning

Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.