uniformly bounded
On the Convergence of Multicalibration Gradient Boosting
Haimovich, Daniel, Linder, Fridolin, Perini, Lorenzo, Tax, Niek, Vojnovic, Milan
Multicalibration gradient boosting has recently emerged as a scalable method that empirically produces approximately multicalibrated predictors and has been deployed at web scale. Despite this empirical success, its convergence properties are not well understood. In this paper, we bridge the gap by providing convergence guarantees for multicalibration gradient boosting in regression with squared-error loss. We show that the magnitude of successive prediction updates decays at $O(1/\sqrt{T})$, which implies the same convergence rate bound for the multicalibration error over rounds. Under additional smoothness assumptions on the weak learners, this rate improves to linear convergence. We further analyze adaptive variants, showing local quadratic convergence of the training loss, and we study rescaling schemes that preserve convergence. Experiments on real-world datasets support our theory and clarify the regimes in which the method achieves fast convergence and strong multicalibration.
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Germany (0.04)
A Unified Kantorovich Duality for Multimarginal Optimal Transport
Cheryala, Yehya, Alaya, Mokhtar Z., Bouzebda, Salim
Multimarginal optimal transport (MOT) has gained increasing attention in recent years, notably due to its relevance in machine learning and statistics, where one seeks to jointly compare and align multiple probability distributions. This paper presents a unified and complete Kantorovich duality theory for MOT problem on general Polish product spaces with bounded continuous cost function. For marginal compact spaces, the duality identity is derived through a convex-analytic reformulation, that identifies the dual problem as a Fenchel-Rockafellar conjugate. We obtain dual attainment and show that optimal potentials may always be chosen in the class of $c$-conjugate families, thereby extending classical two-marginal conjugacy principle into a genuinely multimarginal setting. In non-compact setting, where direct compactness arguments are unavailable, we recover duality via a truncation-tightness procedure based on weak compactness of multimarginal transference plans and boundedness of the cost. We prove that the dual value is preserved under restriction to compact subsets and that admissible dual families can be regularized into uniformly bounded $c$-conjugate potentials. The argument relies on a refined use of $c$-splitting sets and their equivalence with multimarginal $c$-cyclical monotonicity. We then obtain dual attainment and exact primal-dual equality for MOT on arbitrary Polish spaces, together with a canonical representation of optimal dual potentials by $c$-conjugacy. These results provide a structural foundation for further developments in probabilistic and statistical analysis of MOT, including stability, differentiability, and asymptotic theory under marginal perturbations.
- North America > United States > New York (0.04)
- Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Sharp Structure-Agnostic Lower Bounds for General Functional Estimation
Jin, Jikai, Syrgkanis, Vasilis
The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing interest in structure-agnostic approaches -- methods that debias black-box nuisance estimates without imposing structural priors. Understanding the fundamental limits of these methods is therefore crucial. This paper provides a systematic investigation of the optimal error rates achievable by structure-agnostic estimators. We first show that, for estimating the average treatment effect (ATE), a central parameter in causal inference, doubly robust learning attains optimal structure-agnostic error rates. We then extend our analysis to a general class of functionals that depend on unknown nuisance functions and establish the structure-agnostic optimality of debiased/double machine learning (DML). We distinguish two regimes -- one where double robustness is attainable and one where it is not -- leading to different optimal rates for first-order debiasing, and show that DML is optimal in both regimes. Finally, we instantiate our general lower bounds by deriving explicit optimal rates that recover existing results and extend to additional estimands of interest. Our results provide theoretical validation for widely used first-order debiasing methods and guidance for practitioners seeking optimal approaches in the absence of structural assumptions. This paper generalizes and subsumes the ATE lower bound established in \citet{jin2024structure} by the same authors.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Eventually LIL Regret: Almost Sure $\ln\ln T$ Regret for a sub-Gaussian Mixture on Unbounded Data
Agrawal, Shubhada, Ramdas, Aaditya
We prove that a classic sub-Gaussian mixture proposed by Robbins in a stochastic setting actually satisfies a path-wise (deterministic) regret bound. For every path in a natural ``Ville event'' $E_α$, this regret till time $T$ is bounded by $\ln^2(1/α)/V_T + \ln (1/α) + \ln \ln V_T$ up to universal constants, where $V_T$ is a nonnegative, nondecreasing, cumulative variance process. (The bound reduces to $\ln(1/α) + \ln \ln V_T$ if $V_T \geq \ln(1/α)$.) If the data were stochastic, then one can show that $E_α$ has probability at least $1-α$ under a wide class of distributions (eg: sub-Gaussian, symmetric, variance-bounded, etc.). In fact, we show that on the Ville event $E_0$ of probability one, the regret on every path in $E_0$ is eventually bounded by $\ln \ln V_T$ (up to constants). We explain how this work helps bridge the world of adversarial online learning (which usually deals with regret bounds for bounded data), with game-theoretic statistics (which can handle unbounded data, albeit using stochastic assumptions). In short, conditional regret bounds serve as a bridge between stochastic and adversarial betting.
Flexible Deep Neural Networks for Partially Linear Survival Data
Arie, Asaf Ben, Gorfine, Malka
We propose a flexible deep neural network (DNN) framework for modeling survival data within a partially linear regression structure. The approach preserves interpretability through a parametric linear component for covariates of primary interest, while a nonparametric DNN component captures complex time-covariate interactions among nuisance variables. We refer to the method as FLEXI-Haz, a flexible hazard model with a partially linear structure. In contrast to existing DNN approaches for partially linear Cox models, FLEXI-Haz does not rely on the proportional hazards assumption. We establish theoretical guarantees: the neural network component attains minimax-optimal convergence rates based on composite Holder classes, and the linear estimator is root-n consistent, asymptotically normal, and semiparametrically efficient. Extensive simulations and real-data analyses demonstrate that FLEXI-Haz provides accurate estimation of the linear effect, offering a principled and interpretable alternative to modern methods based on proportional hazards. Code for implementing FLEXI-Haz, as well as scripts for reproducing data analyses and simulations, is available at: https://github.com/AsafBanana/FLEXI-Haz
- North America > United States > New York (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Limits of Discrete Energy of Families of Increasing Sets
The Hausdorff dimension of a set can be detected using the Riesz energy. Here, we consider situations where a sequence of points, $\{x_n\}$, ``fills in'' a set $E \subset \mathbb{R}^d$ in an appropriate sense and investigate the degree to which the discrete analog to the Riesz energy of these sets can be used to bound the Hausdorff dimension of $E$. We also discuss applications to data science and Erdős/Falconer type problems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia (0.04)
- (3 more...)
FOSSIL: Regret-minimizing weighting for robust learning under imbalance and small data
Cha, J., Lee, J., Cho, J., Shin, J.
Imbalanced and small data regimes are pervasive in domains such as rare disease imaging, genomics, and disaster response, where labeled samples are scarce and naive augmentation often introduces artifacts. Existing solutions such as oversampling, focal loss, or meta-weighting address isolated aspects of this challenge but remain fragile or complex. We introduce FOSSIL (Flexible Optimization via Sample Sensitive Importance Learning), a unified weighting framework that seamlessly integrates class imbalance correction, difficulty-aware curricula, augmentation penalties, and warmup dynamics into a single interpretable formula. Unlike prior heuristics, the proposed framework provides regret-based theoretical guarantees and achieves consistent empirical gains over ERM, curriculum, and meta-weighting baselines on synthetic and real-world datasets, while requiring no architectural changes.
- North America > United States > Ohio (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
Supplementary material for the paper The emergence of clusters in self attention dynamics
This appendix is organized as follows: Appendix A: Well-posedness results. Throughout the remainder of the paper, we use the terminology "tokens" Definition 3 (Equi-compactly supported curves) . To prove Proposition A.1, we show a more general result concerning global existence and uniqueness We will make use of the following lemma regarding ( 6). R. We now show ( 8). R), which finally leads us to ( 9).
Collision-Free Bearing-Driven Formation Tracking for Euler-Lagrange Systems
Cheng, Haoshu, Guay, Martin, Wang, Shimin, Che, Yunhong
In this paper, we investigate the problem of tracking formations driven by bearings for heterogeneous Euler-Lagrange systems with parametric uncertainty in the presence of multiple moving leaders. To estimate the leaders' velocities and accelerations, we first design a distributed observer for the leader system, utilizing a bearing-based localization condition in place of the conventional connectivity assumption. This observer, coupled with an adaptive mechanism, enables the synthesis of a novel distributed control law that guides the formation towards the target formation, without requiring prior knowledge of the system parameters. Furthermore, we establish a sufficient condition, dependent on the initial formation configuration, that ensures collision avoidance throughout the formation evolution. The effectiveness of the proposed approach is demonstrated through a numerical example. Keywords: Bearing-based formation, distributed observer, multi-agent systems, Euler-Lagrange system1.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Ontario > Kingston (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- Asia > Singapore (0.04)