arXiv.org Machine Learning
Learning Credal Ensembles via Distributionally Robust Optimization
Wang, Kaizheng, Faza, Ghifari Adam, Cuzzolin, Fabio, Chau, Siu Lun, Moens, David, Hallez, Hans
Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Europe > Netherlands (0.04)
- Asia > Singapore (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees
Santos, Vagner, Coscrato, Victor, Cabezas, Luben, Izbicki, Rafael, Ramos, Thiago
Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.
- South America > Brazil > São Paulo (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Low-degree Lower bounds for clustering in moderate dimension
Carpentier, Alexandra, Verzelen, Nicolas
We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.
- North America > United States (0.14)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- Europe > Italy > Lazio > Rome (0.04)
- (3 more...)
Regular Fourier Features for Nonstationary Gaussian Processes
Jawaid, Arsalan, Karatas, Abdullah, Seewig, Jörg
Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation, treating the spectral density as a probability distribution for Monte Carlo approximation. Although this probabilistic interpretation works for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes that avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite spectral support assumption, this yields an efficient low-rank approximation that is positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary kernels and on harmonizable mixture kernels with complex-valued spectral densities.
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- North America > United States (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback
Maiti, Arnab, Zhang, Claire Jie, Jamieson, Kevin, Morgenstern, Jamie Heather, Panageas, Ioannis, Ratliff, Lillian J.
In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
Conditional neural control variates for variance reduction in Bayesian inverse problems
Bayesian inference for inverse problems involves computing expectations under posterior distributions -- e.g., posterior means, variances, or predictive quantities -- typically via Monte Carlo (MC) estimation. When the quantity of interest varies significantly under the posterior, accurate estimates demand many samples -- a cost often prohibitive for partial differential equation-constrained problems. To address this challenge, we introduce conditional neural control variates, a modular method that learns amortized control variates from joint model-data samples to reduce the variance of MC estimators. To scale to high-dimensional problems, we leverage Stein's identity to design an architecture based on an ensemble of hierarchical coupling layers with tractable Jacobian trace computation. Training requires: (i) samples from the joint distribution of unknown parameters and observed data; and (ii) the posterior score function, which can be computed from physics-based likelihood evaluations, neural operator surrogates, or learned generative models such as conditional normalizing flows. Once trained, the control variates generalize across observations without retraining. We validate our approach on stylized and partial differential equation-constrained Darcy flow inverse problems, demonstrating substantial variance reduction, even when the analytical score is replaced by a learned surrogate.
Stochastic Discount Factors with Cross-Asset Spillovers
The central objective of empirical asset pricing is to identify firm-level signals that explain the cross-section of expected stock returns--whether through exposure to risk factors or persistent mispricing. The dominant paradigm, grounded in the assumption of self-predictability, asserts that a firm's own characteristics forecast its own returns (see, e.g., Cochrane (2011); Harvey et al. (2016)). Complementing this view is a growing literature on cross-predictability--the idea that the characteristics or returns of one asset can help forecast the returns of others (see, e.g., Lo and MacKinlay (1990); Hou (2007); Cohen and Frazzini (2008); Cohen and Lou (2012); Huang et al. (2021, 2022)). A key mechanism underpinning this phenomenon is the presence of lead-lag effects, whereby price movements or information from one firm precede and predict those of related firms. Such effects can stem from staggered information diffusion, peer influence within industries, supply chain linkages, or correlated trading by institutional investors that induces price pressure across related assets. Despite recent methodological advances in modeling cross-stock predictability, several foundational questions remain unresolved. Chief among them is how a mean-variance investor can analytically integrate multiple predictive signals when returns are interconnected across assets. Equally crucial is developing a framework that jointly captures both the relevance of individual signals and the structure of return spillovers--enhancing portfolio performance while preserving interpretability .
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- (3 more...)
An Enhanced Projection Pursuit Tree Classifier with Visual Methods for Assessing Algorithmic Improvements
da Silva, Natalia, Cook, Dianne, Lee, Eun-Kyung
This paper presents enhancements to the projection pursuit tree classifier and visual diagnostic methods for assessing their impact in high dimensions. The original algorithm uses linear combinations of variables in a tree structure where depth is constrained to be less than the number of classes -- a limitation that proves too rigid for complex classification problems. Our extensions improve performance in multi-class settings with unequal variance-covariance structures and nonlinear class separations by allowing more splits and more flexible class groupings in the projection pursuit computation. Proposing algorithmic improvements is straightforward; demonstrating their actual utility is not. We therefore develop two visual diagnostic approaches to verify that the enhancements perform as intended. Using high-dimensional visualization techniques, we examine model fits on benchmark datasets to assess whether the algorithm behaves as theorized. An interactive web application enables users to explore the behavior of both the original and enhanced classifiers under controlled scenarios. The enhancements are implemented in the R package PPtreeExt.
Characterizing Online and Private Learnability under Distributional Constraints via Generalized Smoothness
Blanchard, Moïse, Shetty, Abhishek, Rakhlin, Alexander
Understanding minimal assumptions that enable learning and generalization is perhaps the central question of learning theory. Several celebrated results in statistical learning theory, such as the VC theorem and Littlestone's characterization of online learnability, establish conditions on the hypothesis class that allow for learning under independent data and adversarial data, respectively. Building upon recent work bridging these extremes, we study sequential decision making under distributional adversaries that can adaptively choose data-generating distributions from a fixed family $U$ and ask when such problems are learnable with sample complexity that behaves like the favorable independent case. We provide a near complete characterization of families $U$ that admit learnability in terms of a notion known as generalized smoothness i.e. a distribution family admits VC-dimension-dependent regret bounds for every finite-VC hypothesis class if and only if it is generalized smooth. Further, we give universal algorithms that achieve low regret under any generalized smooth adversary without explicit knowledge of $U$. Finally, when $U$ is known, we provide refined bounds in terms of a combinatorial parameter, the fragmentation number, that captures how many disjoint regions can carry nontrivial mass under $U$. These results provide a nearly complete understanding of learnability under distributional adversaries. In addition, building upon the surprising connection between online learning and differential privacy, we show that the generalized smoothness also characterizes private learnability under distributional constraints.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
Maximum entropy based testing in network models: ERGMs and constrained optimization
Ghosh, Subhrosekhar, Karmakar, Rathindra Nath, Lahiry, Samriddha
Stochastic network models play a central role across a wide range of scientific disciplines, and questions of statistical inference arise naturally in this context. In this paper we investigate goodness-of-fit and two-sample testing procedures for statistical networks based on the principle of maximum entropy (MaxEnt). Our approach formulates a constrained entropy-maximization problem on the space of networks, subject to prescribed structural constraints. The resulting test statistics are defined through the Lagrange multipliers associated with the constrained optimization problem, which, to our knowledge, is novel in the statistical networks literature. We establish consistency in the classical regime where the number of vertices is fixed. We then consider asymptotic regimes in which the graph size grows with the sample size, developing tests for both dense and sparse settings. In the dense case, we analyze exponential random graph models (ERGM) (including the Erdös-Rènyi models), while in the sparse regime our theory applies to Erd{ö}s-R{è}nyi graphs. Our analysis leverages recent advances in nonlinear large deviation theory for random graphs. We further show that the proposed Lagrange-multiplier framework connects naturally to classical score tests for constrained maximum likelihood estimation. The results provide a unified entropy-based framework for network model assessment across diverse growth regimes.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- North America > United States > New York (0.04)
- Research Report (0.64)
- Overview (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- (2 more...)