control arm
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
Toward Valid Generative Clinical Trial Data with Survival Endpoints
Chassat, Perrine, Nguyen, Van Tuan, Ducrot, Lucas, Lanoy, Emilie, Guilloux, Agathe
Clinical trials face mounting challenges: fragmented patient populations, slow enrollment, and unsustainable costs, particularly for late phase trials in oncology and rare diseases. While external control arms built from real-world data have been explored, a promising alternative is the generation of synthetic control arms using generative AI. A central challenge is the generation of time-to-event outcomes, which constitute primary endpoints in oncology and rare disease trials, but are difficult to model under censoring and small sample sizes. Existing generative approaches, largely GAN-based, are data-hungry, unstable, and rely on strong assumptions such as independent censoring. We introduce a variational autoencoder (VAE) that jointly generates mixed-type covariates and survival outcomes within a unified latent variable framework, without assuming independent censoring. Across synthetic and real trial datasets, we evaluate our model in two realistic scenarios: (i) data sharing under privacy constraints, where synthetic controls substitute for original data, and (ii) control-arm augmentation, where synthetic patients mitigate imbalances between treated and control groups. Our method outperforms GAN baselines on fidelity, utility, and privacy metrics, while revealing systematic miscalibration of type I error and power. We propose a post-generation selection procedure that improves calibration, highlighting both progress and open challenges for generative survival modeling.
- North America > United States (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > France > Normandy > Seine-Maritime > Rouen (0.04)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- (2 more...)
Machine Learning-Based Manufacturing Cost Prediction from 2D Engineering Drawings via Geometric Features
Arıkan, Ahmet Bilal, Özönder, Şener, Koçyiğit, Mustafa Taha, Altun, Hüseyin Oktay, Küçükkartal, H. Kübra, Arslanoğlu, Murat, Çağırankaya, Fatih, Ayvaz, Berk
We present an integrated machine learning framework that transforms how manufacturing cost is estimated from 2D engineering drawings. Unlike traditional quotation workflows that require labor-intensive process planning, our approach about 200 geometric and statistical descriptors directly from 13,684 DWG drawings of automotive suspension and steering parts spanning 24 product groups. Gradient-boosted decision tree models (XGBoost, CatBoost, LightGBM) trained on these features achieve nearly 10% mean absolute percentage error across groups, demonstrating robust scalability beyond part-specific heuristics. By coupling cost prediction with explainability tools such as SHAP, the framework identifies geometric design drivers including rotated dimension maxima, arc statistics and divergence metrics, offering actionable insights for cost-aware design. This end-to-end CAD-to-cost pipeline shortens quotation lead times, ensures consistent and transparent cost assessments across part families and provides a deployable pathway toward real-time, ERP-integrated decision support in Industry 4.0 manufacturing environments.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
A novel language model for predicting serious adverse event results in clinical trials from their prospective registrations
Hu, Qixuan, Zhang, Xumou, Kim, Jinman, Bourgeois, Florence, Dunn, Adam G.
Objectives: With accurate estimates of expected safety results, clinical trials could be better designed and monitored. We evaluated methods for predicting serious adverse event (SAE) results in clinical trials using information only from their registrations prior to the trial. Material and Methods: We analyzed 22,107 two-arm parallel interventional clinical trials from ClinicalTrials.gov with structured summary results. Two prediction models were developed: a classifier predicting whether a greater proportion of participants in an experimental arm would have SAEs (area under the receiver operating characteristic curve; AUC) compared to the control arm, and a regression model to predict the proportion of participants with SAEs in the control arms (root mean squared error; RMSE). A transfer learning approach using pretrained language models (e.g., ClinicalT5, BioBERT) was used for feature extraction, combined with a downstream model for prediction. To maintain semantic representation in long trial texts exceeding localized language model input limits, a sliding window method was developed for embedding extraction. Results: The best model (ClinicalT5+Transformer+MLP) had 77.6% AUC when predicting which trial arm had a higher proportion of SAEs. When predicting SAE proportion in the control arm, the same model achieved RMSE of 18.6%. The sliding window approach consistently outperformed direct comparisons. Across 12 classifiers, the average absolute AUC increase was 2.00%, and absolute RMSE reduction was 1.58% across 12 regressors. Discussion: Summary results data from ClinicalTrials.gov remains underutilized. Predicted results of publicly reported trials provides an opportunity to identify discrepancies between expected and reported safety results.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > District of Columbia > Washington (0.04)
Spline Dimensional Decomposition with Interpolation-based Optimal Knot Selection for Stochastic Dynamic Analysis
Kim, Yeonsu, Lee, Junhan, Wang, Bingran, Hwang, John T., Lee, Dongjin
Forward uncertainty quantification in dynamical systems is challenging due to non-smooth or locally oscillating nonlinear behaviors. Spline dimensional decomposition (SDD) addresses such nonlinearity by partitioning input coordinates via knot placement, but its accuracy is highly sensitive to internal knot locations. Optimizing knots using sequential quadratic programming is effective, yet computationally expensive. We propose a computationally efficient, interpolation-based method for optimal knot selection in SDD. The method includes: (1) interpolating input-output profiles, (2) defining subinterval-based reference regions, and (3) selecting knots at maximum gradient points within each region. The resulting knot vector is then applied to SDD for accurate approximation of non-smooth and oscillatory responses. A modal analysis of a lower control arm shows that SDD with the proposed knots yields higher accuracy than SDD with uniformly or randomly spaced knots and a Gaussian process model. In this example, the proposed SDD achieves the lowest relative variance error (2.89%) for the first natural frequency distribution, compared to uniformly spaced knots (12.310%), randomly spaced knots (15.274%), and Gaussian process (5.319%). All surrogates are constructed using the same 401 simulation datasets, and errors are evaluated against a 2000-sample Monte Carlo simulation. Scalability and applicability are demonstrated through stochastic and reliability analyses of one- and three-dimensional benchmark functions, and a ten-dimensional lower control arm model. Results confirm that second-moment statistics and reliability estimates can be accurately obtained with only a few hundred function evaluations or finite element simulations.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control
Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, Martin J. Wainwright
We propose an alternative framework to existing setups for controlling false alarms when multiple A/B tests are run over time. This setup arises in many practical applications, e.g. when pharmaceutical companies test new treatment options against control pills for different diseases, or when internet companies test their default webpages versus various alternatives over time. Our framework proposes to replace a sequence of A/B tests by a sequence of best-arm MAB instances, which can be continuously monitored by the data scientist. When interleaving the MAB tests with an online false discovery rate (FDR) algorithm, we can obtain the best of both worlds: low sample complexity and any time online FDR control. Our main contributions are: (i) to propose reasonable definitions of a null hypothesis for MAB instances; (ii) to demonstrate how one can derive an always-valid sequential p-value that allows continuous monitoring of each MAB test; and (iii) to show that using rejection thresholds of online-FDR algorithms as the confidence levels for the MAB algorithms results in both sample-optimality, high power and low FDR at any point in time. We run extensive simulations to verify our claims, and also report results on real data collected from the New Yorker Cartoon Caption contest.
- North America > United States > New York (0.24)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
Preliminary Study of the Impact of AI-Based Interventions on Health and Behavioral Outcomes in Maternal Health Programs
Dasgupta, Arpan, Boehmer, Niclas, Madhiwalla, Neha, Hedge, Aparna, Wilder, Bryan, Tambe, Milind, Taneja, Aparna
Automated voice calls are an effective method of delivering maternal and child health information to mothers in underserved communities. One method to fight dwindling listenership is through an intervention in which health workers make live service calls. Previous work has shown that we can use AI to identify beneficiaries whose listenership gets the greatest boost from an intervention. It has also been demonstrated that listening to the automated voice calls consistently leads to improved health outcomes for the beneficiaries of the program. These two observations combined suggest the positive effect of AI-based intervention scheduling on behavioral and health outcomes. This study analyzes the relationship between the two. Specifically, we are interested in mothers' health knowledge in the post-natal period, measured through survey questions. We present evidence that improved listenership through AI-scheduled interventions leads to a better understanding of key health issues during pregnancy and infancy. This improved understanding has the potential to benefit the health outcomes of mothers and their babies.
- Asia > India > Maharashtra > Mumbai (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.69)
- Health & Medicine > Public Health (1.00)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.85)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.69)
Evaluating the Effectiveness of Index-Based Treatment Allocation
Boehmer, Niclas, Nair, Yash, Shah, Sanket, Janson, Lucas, Taneja, Aparna, Tambe, Milind
When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized control trial. Such policies create dependencies between agents, which render the assumptions behind standard statistical tests invalid and limit the effectiveness of estimators. Addressing these challenges, we translate and extend recent ideas from the statistics literature to present an efficient estimator and methods for computing asymptotically correct confidence intervals. This enables us to effectively draw valid statistical conclusions, a critical gap in previous work. Our extensive experiments validate our methodology in practical settings, while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial to evaluate different ML allocation policies in the context of a mHealth program, drawing previously invisible conclusions.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
- North America > United States > Wisconsin (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (0.67)
FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings
Terrail, Jean Ogier du, Klopfenstein, Quentin, Li, Honghao, Mayer, Imke, Loiseau, Nicolas, Hallal, Mohammad, Balazard, Félix, Andreux, Mathieu
External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original collection centers, along with pharmaceutical companies' competitive motives. In this paper, we leverage a privacy-enhancing technology called federated learning (FL) to remove some of the barriers to data sharing. We introduce a federated learning inverse probability of treatment weighted (IPTW) method for time-to-event outcomes called FedECA which eases the implementation of ECA by limiting patients' data exposure. We show with extensive experiments that FedECA outperforms its closest competitor, matching-adjusted indirect comparison (MAIC), in terms of statistical power and ability to balance the treatment and control groups. To encourage the use of such methods, we publicly release our code which relies on Substra, an open-source FL software with proven experience in privacy-sensitive contexts.
- North America > United States > Virginia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Belgium (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Software (0.88)