random variable
Tail-Aware Information-Theoretic Generalization for RLHF and SGLD
Zhang, Huiming, Li, Binghan, Tian, Wan, Sun, Qiang
Classical information-theoretic generalization bounds typically control the generalization gap through KL-based mutual information and therefore rely on boundedness or sub-Gaussian tails via the moment generating function (MGF). In many modern pipelines, such as robust learning, RLHF, and stochastic optimization, losses and rewards can be heavy-tailed, and MGFs may not exist, rendering KL-based tools ineffective. We develop a tail-dependent information-theoretic framework for sub-Weibull data, where the tail parameter $θ$ controls the tail heaviness: $θ=2$ corresponds to sub-Gaussian, $θ=1$ to sub-exponential, and $0<θ<1$ to genuinely heavy tails. Our key technical ingredient is a decorrelation lemma that bounds change-of-measure expectations using a shifted-log $f_θ$-divergence, which admits explicit comparisons to Rényi divergence without MGF arguments. On the empirical-process side, we establish sharp maximal inequalities and a Dudley-type chaining bound for sub-Weibull processes with tail index $θ$, with complexity scaling as $\log^{1/θ}$ and entropy$^{1/θ}$. These tools yield expected and high-probability PAC-Bayes generalization bounds, as well as an information-theoretic chaining inequality based on multiscale Rényi mutual information. We illustrate the consequences in Rényi-regularized RLHF under heavy-tailed rewards and in stochastic gradient Langevin dynamics with heavy-tailed gradient noise.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)
Demographic Parity Tails for Regression
Le, Naht Sinh, Denis, Christophe, Hebiri, Mohamed
Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be unnecessary for many applications, where fairness concerns are localized to specific regions of the distribution. To overcome this issue, we propose a new framework for regression under DP that focuses on the tails of target distribution across sensitive groups. Our methodology builds on optimal transport theory. By enforcing fairness constraints only over targeted regions of the distribution, our approach enables more nuanced and context-sensitive interventions. Leveraging recent advances, we develop an interpretable and flexible algorithm that leverages the geometric structure of optimal transport. We provide theoretical guarantees, including risk bounds and fairness properties, and validate the method through experiments in regression settings.
- North America > United States > California (0.04)
- Europe > France (0.04)
Sharp Concentration Inequalities: Phase Transition and Mixing of Orlicz Tails with Variance
In this work, we investigate how to develop sharp concentration inequalities for sub-Weibull random variables, including sub-Gaussian and sub-exponential distributions. Although the random variables may not be sub-Guassian, the tail probability around the origin behaves as if they were sub-Gaussian, and the tail probability decays align with the Orlicz $Ψ_α$-tail elsewhere. Specifically, for independent and identically distributed (i.i.d.) $\{X_i\}_{i=1}^n$ with finite Orlicz norm $\|X\|_{Ψ_α}$, our theory unveils that there is an interesting phase transition at $α= 2$ in that $\PPł(ł|\sum_{i=1}^n X_i \r| \geq t\r)$ with $t > 0$ is upper bounded by $2\expł(-C\maxł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $α\geq 2$, and by $2\expł(-C\minł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $1\leq α\leq 2$ with some positive constant $C$. In many scenarios, it is often necessary to distinguish the standard deviation from the Orlicz norm when the latter can exceed the former greatly. To accommodate this, we build a new theoretical analysis framework, and our sharp, flexible concentration inequalities involve the variance and a mixing of Orlicz $Ψ_α$-tails through the min and max functions. Our theory yields new, improved concentration inequalities even for the cases of sub-Gaussian and sub-exponential distributions with $α= 2$ and $1$, respectively. We further demonstrate our theory on martingales, random vectors, random matrices, and covariance matrix estimation. These sharp concentration inequalities can empower more precise non-asymptotic analyses across different statistical and machine learning applications.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
High-Resolution Tensor-Network Fourier Methods for Exponentially Compressed Non-Gaussian Aggregate Distributions
Rodríguez-Aldavero, Juan José, García-Ripoll, Juan José
Its low-rank QTT structure arises from intrinsic spectral smoothness in continuous models, or from spectral energy concentration as the number of components D grows in discrete models. We demonstrate this on weighted sums of Bernoulli and lognormal random variables. In the latter, the approach reaches high-resolution discretizations of N = 230 frequency modes on standard hardware, far beyond the N =224 ceiling of dense implementations. These compressed representations enable efficient computation of Value at Risk (VaR) and Expected Shortfall (ES), supporting applications in quantitative finance and beyond. I. INTRODUCTION Weighted sums of independent random variables constitute a basic probabilistic model, describing macroscopic behavior arising from the aggregation of microscopic stochastic components. These models arise in a wide range of applications. Their probability distribution generally lacks a closed-form expression, and their evaluation involves multidimensional convolution integrals that are susceptible to the curse of dimensionality. Consequently, evaluating these models relies on specializednumericalmethods. Whilethese methods have been adapted for discrete settings [18, 19], they are frequently hampered by persistent Gibbs oscillations, which arise from distributional discontinuities and preclude uniform convergence [20, 21]. No existing method simultaneously achieves an accurate approximation of the exact, fully non-Gaussian target distribution while remaining scalable to larger, practically relevant system sizes. In this work, we introduce a new algorithm that combines the Fourier spectral method with tensor-network techniques.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- (3 more...)
Contextual Graph Matching with Correlated Gaussian Features
Yarandi, Mohammad Hassan Ahmad, Ganassali, Luca
We investigate contextual graph matching in the Gaussian setting, where both edge weights and node features are correlated across two networks. We derive precise information-theoretic thresholds for exact recovery, and identify conditions under which almost exact recovery is possible or impossible, in terms of graph and feature correlation strengths, the number of nodes, and feature dimension. Interestingly, whereas an all-or-nothing phase transition is observed in the standard graph-matching scenario, the additional contextual information introduces a richer structure: thresholds for exact and almost exact recovery no longer coincide. Our results provide the first rigorous characterization of how structural and contextual information interact in graph matching, and establish a benchmark for designing efficient algorithms.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
The monotonicity of the Franz-Parisi potential is equivalent with Low-degree MMSE lower bounds
Tsirkas, Konstantinos, Wang, Leda, Zadik, Ilias
Over the last decades, two distinct approaches have been instrumental to our understanding of the computational complexity of statistical estimation. The statistical physics literature predicts algorithmic hardness through local stability and monotonicity properties of the Franz--Parisi (FP) potential \cite{franz1995recipes,franz1997phase}, while the mathematically rigorous literature characterizes hardness via the limitations of restricted algorithmic classes, most notably low-degree polynomial estimators \cite{hopkins2017efficient}. For many inference models, these two perspectives yield strikingly consistent predictions, giving rise to a long-standing open problem of establishing a precise mathematical relationship between them. In this work, we show that for estimation problems the power of low-degree polynomials is equivalent to the monotonicity of the annealed FP potential for a broad family of Gaussian additive models (GAMs) with signal-to-noise ratio $λ$. In particular, subject to a low-degree conjecture for GAMs, our results imply that the polynomial-time limits of these models are directly implied by the monotonicity of the annealed FP potential, in conceptual agreement with predictions from the physics literature dating back to the 1990s.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Independence clustering (without a matrix)
The independence clustering problem is considered in the following formulation: given a set $S$ of random variables, it is required to find the finest partitioning $\{U_1,\dots,U_k\}$ of $S$ into clusters such that the clusters $U_1,\dots,U_k$ are mutually independent. Since mutual independence is the target, pairwise similarity measurements are of no use, and thus traditional clustering algorithms are inapplicable. The distribution of the random variables in $S$ is, in general, unknown, but a sample is available. Thus, the problem is cast in terms of time series. Two forms of sampling are considered: i.i.d.\ and stationary time series, with the main emphasis being on the latter, more general, case. A consistent, computationally tractable algorithm for each of the settings is proposed, and a number of fascinating open directions for further research are outlined.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New Jersey > Hudson County > Secaucus (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (2 more...)