AITopics | infinite variance

9cdf26568d166bc6793ef8da5afa0846-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 05:46:07 GMT

noise, sequence, variance, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Neural Information Processing SystemsDec-24-2025, 14:12:48 GMT

Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined `$p$-positive (semi-)definiteness', that leads to an interesting interpolation between the positive semi-definite cone ($p=2$) and the cone of diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $\alpha$-stable random vector.Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function nor to the algorithm itself, as typically required in robust statistics.We demonstrate the implications of our resultsover misspecified models, in the presence of heavy-tailed data.

convergence rate, name change, stochastic gradient descent, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Taming Fat-Tailed ("Heavier-Tailed" with Potentially Infinite Variance) Noise in Federated Learning

Neural Information Processing SystemsDec-24-2025, 09:59:10 GMT

In recent years, federated learning (FL) has emerged as an important distributed machine learning paradigm to collaboratively learn a global model with multiple clients, while keeping data local and private. However, a key assumption in most existing works on FL algorithms' convergence analysis is that the noise in stochastic first-order information has a finite variance. Although this assumption covers all light-tailed (i.e., sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal, Weibull, and some Pareto distributions), it fails for many fat-tailed noise distributions (i.e., ``heavier-tailed'' with potentially infinite variance) that have been empirically observed in the FL literature. To date, it remains unclear whether one can design convergent algorithms for FL systems that experience fat-tailed noise.

clipping, infinite variance, mathsf, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the Optimality of the Median-of-Means Estimator under Adversarial Contamination

de Juan, Xabier, Mazuelas, Santiago

arXiv.org Machine LearningOct-10-2025

The Median-of-Means (MoM) is a robust estimator widely used in machine learning that is known to be (minimax) optimal in scenarios where samples are i.i.d. In more grave scenarios, samples are contaminated by an adversary that can inspect and modify the data. Previous work has theoretically shown the suitability of the MoM estimator in certain contaminated settings. However, the (minimax) optimality of MoM and its limitations under adversarial contamination remain unknown beyond the Gaussian case. In this paper, we present upper and lower bounds for the error of MoM under adversarial contamination for multiple classes of distributions. In particular, we show that MoM is (minimax) optimal in the class of distributions with finite variance, as well as in the class of distributions with infinite variance and finite absolute $(1+r)$-th moment. We also provide lower bounds for MoM's error that match the order of the presented upper bounds, and show that MoM is sub-optimal for light-tailed distributions.

contamination, estimator, theorem 3, (17 more...)

arXiv.org Machine Learning

2510.07867

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Neural Information Processing SystemsAug-16-2025, 09:39:54 GMT

Work partially conducted while affiliated with the V ector Institute.

artificial intelligence, machine learning, variance, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Neural Information Processing SystemsMay-27-2025, 01:07:35 GMT

Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the p -th moment of the noise exists for some p\in [1,2), we first identify a condition on the Hessian, coined p -positive (semi-)definiteness', that leads to an interesting interpolation between the positive semi-definite cone ( p 2) and the cone of diagonally dominant matrices with non-negative diagonal entries ( p 1). Under this condition, we provide a convergence rate for the distance to the global optimum in L p .

artificial intelligence, machine learning, stochastic gradient descent, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance

Neural Information Processing SystemsJan-19-2025, 22:22:16 GMT

In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. This optimality is defined in terms of iteration and oracle complexity, as well as the maximal admissible level of adversarial noise. However, the assumption of finite variance is burdensome and it might not hold in many practical scenarios. To address this, we demonstrate how to adapt a refined clipped version of the accelerated gradient (Stochastic Similar Triangles) method from (Sadiev et al., 2023) for a two-point zero-order oracle.

accelerated zeroth-order method, infinite variance, non-smooth stochastic convex optimization problem, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Neural Information Processing SystemsJan-18-2025, 03:40:47 GMT

Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the p -th moment of the noise exists for some p\in [1,2), we first identify a condition on the Hessian, coined p -positive (semi-)definiteness', that leads to an interesting interpolation between the positive semi-definite cone ( p 2) and the cone of diagonally dominant matrices with non-negative diagonal entries ( p 1). Under this condition, we provide a convergence rate for the distance to the global optimum in L p .

convergence rate, infinite noise variance, stochastic gradient descent, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Taming Fat-Tailed ("Heavier-Tailed" with Potentially Infinite Variance) Noise in Federated Learning

Neural Information Processing SystemsOct-11-2024, 12:18:18 GMT

In recent years, federated learning (FL) has emerged as an important distributed machine learning paradigm to collaboratively learn a global model with multiple clients, while keeping data local and private. However, a key assumption in most existing works on FL algorithms' convergence analysis is that the noise in stochastic first-order information has a finite variance. Although this assumption covers all light-tailed (i.e., sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal, Weibull, and some Pareto distributions), it fails for many fat-tailed noise distributions (i.e., heavier-tailed'' with potentially infinite variance) that have been empirically observed in the FL literature. To date, it remains unclear whether one can design convergent algorithms for FL systems that experience fat-tailed noise. Specifically, for the largest \alpha \in (1,2] such that the fat-tailed noise in FL still has a bounded \alpha -moment, we show that both variants achieve \mathcal{O}((mT) {\frac{2-\alpha}{\alpha}}) and \mathcal{O}((mT) {\frac{1-\alpha}{3\alpha-2}}) convergence rates in the strongly-convex and general non-convex settings, respectively, where m and T are the numbers of clients and communication rounds.

clipping, federated learning, mathsf, (11 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Trap (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Catoni-style Confidence Sequences under Infinite Variance

Bhatt, Sujay, Fang, Guanhua, Li, Ping, Samorodnitsky, Gennady

arXiv.org Artificial IntelligenceAug-5-2022

In this paper, we provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times, naturally having a wide range of applications. We first establish a lower bound for the width of the Catoni-style confidence sequences for the finite variance case to highlight the looseness of the existing results. Next, we derive tight Catoni-style confidence sequences for data distributions having a relaxed bounded~$p^{th}-$moment, where~$p \in (1,2]$, and strengthen the results for the finite variance case of~$p =2$. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.

catoni-style confidence sequence, confidence sequence, sequence, (12 more...)

arXiv.org Artificial Intelligence

2208.03185

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Filters

Collaborating Authors

infinite variance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9cdf26568d166bc6793ef8da5afa0846-Paper.pdf

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Taming Fat-Tailed ("Heavier-Tailed" with Potentially Infinite Variance) Noise in Federated Learning

On the Optimality of the Median-of-Means Estimator under Adversarial Contamination

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Taming Fat-Tailed ("Heavier-Tailed" with Potentially Infinite Variance) Noise in Federated Learning

Catoni-style Confidence Sequences under Infinite Variance