stratum
Data reuse enables cost-efficient randomized trials of medical AI models
Nercessian, Michael, Zhang, Wenxin, Schubert, Alexander, Yang, Daphne, Chung, Maggie, Alaa, Ahmed, Yala, Adam
Joint Senior Corresponding Author: Michael Nercessian Email: michael.nercessian@berkeley.edu Abstract Randomized controlled trials (RCTs) are indispensable for establishing the clinical value of medical artificial-intelligence (AI) tools, yet their high cost and long timelines hinder timely validation as new models emerge rapidly. Here, we propose BRIDGE, a data-reuse RCT design for AI-based risk models. AI risk models support a broad range of interventions, including screening, treatment selection, and clinical alerts. BRIDGE trials recycle participant-level data from completed trials of AI models when legacy and updated models make concordant predictions, thereby reducing the enrollment requirement for subsequent trials. We provide a practical checklist for investigators to assess whether reusing data from previous trials allows for valid causal inference and preserves type I error. Using real-world datasets across breast cancer, cardiovascular disease, and sepsis, we demonstrate concordance between successive AI models, with up to 64.8% overlap in top 5% high-risk cohorts. We then simulate a series of breast cancer screening studies, where our design reduced required enrollment by 46.6%--saving over US$2.8 million--while maintaining 80% power. By transforming trials into adaptive, modular studies, our proposed design makes Level I evidence generation feasible for every model iteration, thereby accelerating cost-effective translation of AI into routine care . Introduction Artificial intelligence (AI) models have the potential to transform patient care by identifying high-risk individuals using high-dimensional data--such as imaging, electronic health records, or time-series data--to personalize screening, prevention, and treatment decisions across a range of diseases, including cancer and heart disease.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.57)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Applied AI (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
Eliminating Negative Occurrences of Derived Predicates from PDDL Axioms
Grundke, Claudia, Röger, Gabriele
Axioms are a feature of the Planning Domain Definition Language PDDL that can be considered as a generalization of database query languages such as Datalog. The PDDL standard restricts negative occurrences of predicates in axiom bodies to predicates that are directly set by actions and not derived by axioms. In the literature, authors often deviate from this limitation and only require that the set of axioms is stratifiable. Both variants can express exactly the same queries as least fixed-point logic, indicating that negative occurrences of derived predicates can be eliminated. We present the corresponding transformation.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving
Pan, Yue, Xia, Zihan, Hsu, Po-Kai, Hu, Lanxiang, Kim, Hyungyo, Sharda, Janak, Zhou, Minxuan, Kim, Nam Sung, Yu, Shimeng, Rosing, Tajana, Kang, Mingu
As Large Language Models (LLMs) continue to evolve, Mixture of Experts (MoE) architecture has emerged as a prevailing design for achieving state-of-the-art performance across a wide range of tasks. MoE models use sparse gating to activate only a handful of expert sub-networks per input, achieving billion-parameter capacity with inference costs akin to much smaller models. However, such models often pose challenges for hardware deployment due to the massive data volume introduced by the MoE layers. To address the challenges of serving MoE models, we propose Stratum, a system-hardware co-design approach that combines the novel memory technology Monolithic 3D-Stackable DRAM (Mono3D DRAM), near-memory processing (NMP), and GPU acceleration. The logic and Mono3D DRAM dies are connected through hybrid bonding, whereas the Mono3D DRAM stack and GPU are interconnected via silicon interposer. Mono3D DRAM offers higher internal bandwidth than HBM thanks to the dense vertical interconnect pitch enabled by its monolithic structure, which supports implementations of higher-performance near-memory processing. Furthermore, we tackle the latency differences introduced by aggressive vertical scaling of Mono3D DRAM along the z-dimension by constructing internal memory tiers and assigning data across layers based on access likelihood, guided by topic-based expert usage prediction to boost NMP throughput. The Stratum system achieves up to 8.29x improvement in decoding throughput and 7.66x better energy efficiency across various benchmarks compared to GPU baselines.
- North America > United States > California > San Diego County > La Jolla (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- Asia > South Korea > Seoul > Seoul (0.05)
- (9 more...)
- Information Technology (0.46)
- Semiconductors & Electronics (0.46)
- Leisure & Entertainment (0.46)
Adaptive Machine Learning-Driven Multi-Fidelity Stratified Sampling for Failure Analysis of Nonlinear Stochastic Systems
Xu, Liuyun, Spence, Seymour M. J.
Existing variance reduction techniques used in stochastic simulations for rare event analysis still require a substantial number of model evaluations to estimate small failure probabilities. In the context of complex, nonlinear finite element modeling environments, this can become computationally challenging-particularly for systems subjected to stochastic excitation. To address this challenge, a multi-fidelity stratified sampling scheme with adaptive machine learning metamodels is introduced for efficiently propagating uncertainties and estimating small failure probabilities. In this approach, a high-fidelity dataset generated through stratified sampling is used to train a deep learning-based metamodel, which then serves as a cost-effective and highly correlated low-fidelity model. An adaptive training scheme is proposed to balance the trade-off between approximation quality and computational demand associated with the development of the low-fidelity model. By integrating the low-fidelity outputs with additional high-fidelity results, an unbiased estimate of the strata-wise failure probabilities is obtained using a multi-fidelity Monte Carlo framework. The overall probability of failure is then computed using the total probability theorem. Application to a full-scale high-rise steel building subjected to stochastic wind excitation demonstrates that the proposed scheme can accurately estimate exceedance probability curves for nonlinear responses of interest, while achieving significant computational savings compared to single-fidelity variance reduction approaches.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Materials > Construction Materials (0.68)
- Construction & Engineering (0.46)
Prediction-powered estimators for finite population statistics in highly imbalanced textual data: Public hate crime estimation
Waldetoft, Hannes, Torgander, Jakob, Magnusson, Måns
Estimating population parameters in finite populations of text documents can be challenging when obtaining the labels for the target variable requires manual annotation. To address this problem, we combine predictions from a transformer encoder neural network with well-established survey sampling estimators using the model predictions as an auxiliary variable. The applicability is demonstrated in Swedish hate crime statistics based on Swedish police reports. Estimates of the yearly number of hate crimes and the police's under-reporting are derived using the Hansen-Hurwitz estimator, difference estimation, and stratified random sampling estimation. We conclude that if labeled training data is available, the proposed method can provide very efficient estimates with reduced time spent on manual annotation.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (4 more...)
Stratified Non-Negative Tensor Factorization
Sietsema, Alexander, Vural, Zerrin, Chapman, James, Yaniv, Yotam, Needell, Deanna
Non-negative matrix factorization (NMF) and non-negative tensor factorization (NTF) decompose non-negative high-dimensional data into non-negative low-rank components. NMF and NTF methods are popular for their intrinsic interpretability and effectiveness on large-scale data. Recent work developed Stratified-NMF, which applies NMF to regimes where data may come from different sources (strata) with different underlying distributions, and seeks to recover both strata-dependent information and global topics shared across strata. Applying Stratified-NMF to multi-modal data requires flattening across modes, and therefore loses geometric structure contained implicitly within the tensor. To address this problem, we extend Stratified-NMF to the tensor setting by developing a multiplicative update rule and demonstrating the method on text and image data. We find that Stratified-NTF can identify interpretable topics with lower memory requirements than Stratified-NMF. We also introduce a regularized version of the method and demonstrate its effects on image data.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Canada > Quebec (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
SimpleStrat: Diversifying Language Model Generation with Stratification
Wong, Justin, Orlovskiy, Yury, Luo, Michael, Seshia, Sanjit A., Gonzalez, Joseph E.
Figure 1: Stratfied Sampling vs Temperature Scaling Consider the LLM user request "Name a US State." SimpleStrat employs auto-stratification to utilize the LLM to identify good dimensions of diversity, for instance "East/West of the Mississippi River." Then, SimpleStrat uses stratified sampling to diversify LLM generations. Generating diverse responses from large language models (LLMs) is crucial for applications such as planning/search and synthetic data generation, where diversity provides distinct answers across generations. Prior approaches rely on increasing temperature to increase diversity. However, contrary to popular belief, we show not only does this approach produce lower quality individual generations as temperature increases, but it depends on model's next-token probabilities being similar to the true distribution of answers. We propose SimpleStrat, an alternative approach that uses the language model itself to partition the space into strata. At inference, a random stratum is selected and a sample drawn from within the strata. To measure diversity, we introduce CoverageQA, a dataset of underspecified questions with multiple equally plausible answers, and assess diversity by measuring KL Divergence between the output distribution and uniform distribution over valid ground truth answers. As computing probability per response/solution for proprietary models is infeasible, we measure recall on ground truth solutions. Our evaluation show using SimpleStrat achieves higher recall by 0.05 compared to GPT-4o and 0.36 average reduction in KL Divergence compared to Llama 3. Large language models (LLMs) are routinely resampled in order to get a wide set of plausible generations. Three key settings where this is important are: 1) improving downstream accuracy with planning or search for agentic tasks (i.e. All these use cases rely on the model generating multiple plausible generations for the same prompt when multiple answers exists.
- North America > United States > Missouri (0.05)
- North America > United States > California (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)
A model-free subdata selection method for classification
Subdata selection is a study of methods that select a small representative sample of the big data, the analysis of which is fast and statistically efficient. The existing subdata selection methods assume that the big data can be reasonably modeled using an underlying model, such as a (multinomial) logistic regression for classification problems. These methods work extremely well when the underlying modeling assumption is correct but often yield poor results otherwise. In this paper, we propose a model-free subdata selection method for classification problems, and the resulting subdata is called PED subdata. The PED subdata uses decision trees to find a partition of the data, followed by selecting an appropriate sample from each component of the partition. Random forests are used for analyzing the selected subdata. Our method can be employed for a general number of classes in the response and for both categorical and continuous predictors. We show analytically that the PED subdata results in a smaller Gini than a uniform subdata. Further, we demonstrate that the PED subdata has higher classification accuracy than other competing methods through extensive simulated and real datasets.
- North America > United States > New York > Broome County > Binghamton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.34)
The Geometry of the Set of Equivalent Linear Neural Networks
Shewchuk, Jonathan Richard, Bhattacharya, Sagnik
We characterize the geometry and topology of the set of all weight vectors for which a linear neural network computes the same linear transformation $W$. This set of weight vectors is called the fiber of $W$ (under the matrix multiplication map), and it is embedded in the Euclidean weight space of all possible weight vectors. The fiber is an algebraic variety that is not necessarily a manifold. We describe a natural way to stratify the fiber--that is, to partition the algebraic variety into a finite set of manifolds of varying dimensions called strata. We call this set of strata the rank stratification. We derive the dimensions of these strata and the relationships by which they adjoin each other. Although the strata are disjoint, their closures are not. Our strata satisfy the frontier condition: if a stratum intersects the closure of another stratum, then the former stratum is a subset of the closure of the latter stratum. Each stratum is a manifold of class $C^\infty$ embedded in weight space, so it has a well-defined tangent space and normal space at every point (weight vector). We show how to determine the subspaces tangent to and normal to a specified stratum at a specified point on the stratum, and we construct elegant bases for those subspaces. To help achieve these goals, we first derive what we call a Fundamental Theorem of Linear Neural Networks, analogous to what Strang calls the Fundamental Theorem of Linear Algebra. We show how to decompose each layer of a linear neural network into a set of subspaces that show how information flows through the neural network. Each stratum of the fiber represents a different pattern by which information flows (or fails to flow) through the neural network. The topology of a stratum depends solely on this decomposition. So does its geometry, up to a linear transformation in weight space.
- North America > United States > New York (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Finite-Time Analysis of Stratified Sampling for Monte Carlo
We consider the problem of stratified sampling for Monte-Carlo integration. We model this problem in a multi-armed bandit setting, where the arms represent the strata, and the goal is to estimate a weighted average of the mean values of the arms. We propose a strategy that samples the arms according to an upper bound on their standard deviations and compare its estimation quality to an ideal allocation that would know the standard deviations of the strata.
- North America > Canada > Alberta (0.04)
- Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (0.69)
- Information Technology > Data Science > Data Mining > Big Data (0.35)