Goto

Collaborating Authors

 Madison



Bayesian experimental design: grouped geometric pooled posterior via ensemble Kalman methods

Yang, Huchen, Dong, Xinghao, Wu, Jinlong

arXiv.org Machine Learning

Bayesian experimental design (BED) for complex physical systems is often limited by the nested inference required to estimate the expected information gain (EIG) or its gradients. Each outer sample induces a different posterior, creating a large and heterogeneous set of inference targets. Existing methods have to sacrifice either accuracy or efficiency: they either perform per-outer-sample posterior inference, which yields higher fidelity but at prohibitive computational cost, or amortize the inner inference across all outer samples for computational reuse, at the risk of degraded accuracy under posterior heterogeneity. To improve accuracy and maintain cost at the amortized level, we propose a grouped geometric pooled posterior framework that partitions outer samples into groups and constructs a pooled proposal for each group. While such grouping strategy would normally require generating separate proposal samples for different groups, our tailored ensemble Kalman inversion (EKI) formulation generates these samples without extra forward-model evaluation cost. We also introduce a conservative diagnostic to assess importance-sampling quality to guide grouping. This grouping strategy improves within-group proposal-target alignment, yielding more accurate and stable estimators while keeping the cost comparable to amortized approaches. We evaluate the performance of our method on both Gaussian-linear and high-dimensional network-based model discrepancy calibration problems.


Revisiting Active Sequential Prediction-Powered Mean Estimation

Sfyraki, Maria-Eleni, Wang, Jun-Kun

arXiv.org Machine Learning

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.


MinShap: A Modified Shapley Value Approach for Feature Selection

Zheng, Chenghui, Raskutti, Garvesh

arXiv.org Machine Learning

Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in the context of unknown non-linear relationships with dependent features. On the other hand, Shapley values are a classic solution concept from cooperative game theory that is widely used for feature attribution in general non-linear models with highly-dependent features. However, Shapley values are not naturally suited for feature selection since they tend to capture both direct effects from each feature to the response and indirect effects through other features. In this paper, we combine the advantages of Shapley values and adapt them to feature selection by proposing \emph{MinShap}, a modification of the Shapley value framework along with a suite of other related algorithms. In particular for MinShap, instead of taking the average marginal contributions over permutations of features, considers the minimum marginal contribution across permutations. We provide a theoretical foundation motivated by the faithfulness assumption in DAG (directed acyclic graphical models), a guarantee for the Type I error of MinShap, and show through numerical simulations and real data experiments that MinShap tends to outperform state-of-the-art feature selection algorithms such as LOCO, GCM and Lasso in terms of both accuracy and stability. We also introduce a suite of algorithms related to MinShap by using the multiple testing/p-value perspective that improves performance in lower-sample settings and provide supporting theoretical guarantees.


Cost-optimal Sequential Testing via Doubly Robust Q-learning

Zhou, Doudou, Zhang, Yiran, Jin, Dian, Zheng, Yingye, Tian, Lu, Cai, Tianxi

arXiv.org Machine Learning

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.


Candy now tastes different. It's not just you.

Popular Science

From recipe changes to aging taste buds, here's why those peanut butter cups don't hit like they used to. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. There's a reason you might remember Hershey's chocolate differently. Breakthroughs, discoveries, and DIY tips sent six days a week. Brad Reese, grandson of Reese's Peanut Butter Cups inventor H.B. Reese, caused a stir this year with his claims that The Hershey Company had changed his grandfather's recipes beyond recognition .


Biconvex Biclustering

Rosen, Sam, Chi, Eric C., Xu, Jason

arXiv.org Machine Learning

This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and accordingly weighs informative features while discovering biclusters. Moreover, the method is adaptive to the data, and is accompanied by an efficient algorithm based on proximal alternating minimization, complete with detailed guidance on hyperparameter tuning and efficient solutions to optimization subproblems. These contributions are theoretically grounded; we establish finite-sample bounds on the objective function under sub-Gaussian errors, and generalize these guarantees to cases where input affinities need not be uniform. Extensive simulation results reveal our method consistently recovers underlying biclusters while weighing and selecting features appropriately, outperforming peer methods. An application to a gene microarray dataset of lymphoma samples recovers biclusters matching an underlying classification, while giving additional interpretation to the mRNA samples via the column groupings and fitted weights.


The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence

Paxton, Napoleon

arXiv.org Machine Learning

Dr. David Blackwell was a mathematician and statistician of the first rank, whose contributions to statistical theory, game theory, and decision theory predated many of the algorithmic breakthroughs that define modern artificial intelligence. This survey examines three of his most consequential theoretical results the Rao Blackwell theorem, the Blackwell Approachability theorem, and the Blackwell Informativeness theorem (comparison of experiments) and traces their direct influence on contemporary AI and machine learning. We show that these results, developed primarily in the 1940s and 1950s, remain technically live across modern subfields including Markov Chain Monte Carlo inference, autonomous mobile robot navigation (SLAM), generative model training, no-regret online learning, reinforcement learning from human feedback (RLHF), large language model alignment, and information design. NVIDIAs 2024 decision to name their flagship GPU architecture (Blackwell) provides vivid testament to his enduring relevance. We also document an emerging frontier: explicit Rao Blackwellized variance reduction in LLM RLHF pipelines, recently proposed but not yet standard practice. Together, Blackwell theorems form a unified framework addressing information compression, sequential decision making under uncertainty, and the comparison of information sources precisely the problems at the core of modern AI.


Fused Multinomial Logistic Regression Utilizing Summary-Level External Machine-learning Information

Dai, Chi-Shian, Shao, Jun

arXiv.org Machine Learning

In many modern applications, a carefully designed primary study provides individual-level data for interpretable modeling, while summary-level external information is available through black-box, efficient, and nonparametric machine-learning predictions. Although summary-level external information has been studied in the data integration literature, there is limited methodology for leveraging external nonparametric machine-learning predictions to improve statistical inference in the primary study. We propose a general empirical-likelihood framework that incorporates external predictions through moment constraints. An advantage of nonparametric machine-learning prediction is that it induces a rich class of valid moment restrictions that remain robust to covariate shift under a mild overlap condition without requiring explicit density-ratio modeling. We focus on multinomial logistic regression as the primary model and address common data-quality issues in external sources, including coarsened outcomes, partially observed covariates, covariate shift, and heterogeneity in generating mechanisms known as concept shift. We establish large-sample properties of the resulting fused estimator, including consistency and asymptotic normality under regularity conditions. Moreover, we provide mild sufficient conditions under which incorporating external predictions delivers a strict efficiency gain relative to the primary-only estimator. Simulation studies and an application to the National Health and Nutrition Examination Survey on multiclass blood-pressure classification.


Robust Regression with Adaptive Contamination in Response: Optimal Rates and Computational Barriers

Diakonikolas, Ilias, Gao, Chao, Kane, Daniel M., Pensia, Ankit, Xie, Dong

arXiv.org Machine Learning

We study robust regression under a contamination model in which covariates are clean while the responses may be corrupted in an adaptive manner. Unlike the classical Huber's contamination model, where both covariates and responses may be contaminated and consistent estimation is impossible when the contamination proportion is a non-vanishing constant, it turns out that the clean-covariate setting admits strictly improved statistical guarantees. Specifically, we show that the additional information in the clean covariates can be carefully exploited to construct an estimator that achieves a better estimation rate than that attainable under Huber contamination. In contrast to the Huber model, this improved rate implies consistency even when the contamination is a constant. A matching minimax lower bound is established using Fano's inequality together with the construction of contamination processes that match $m> 2$ distributions simultaneously, extending the previous two-point lower bound argument in Huber's setting. Despite the improvement over the Huber model from an information-theoretic perspective, we provide formal evidence -- in the form of Statistical Query and Low-Degree Polynomial lower bounds -- that the problem exhibits strong information-computation gaps. Our results strongly suggest that the information-theoretic improvements cannot be achieved by polynomial-time algorithms, revealing a fundamental gap between information-theoretic and computational limits in robust regression with clean covariates.