AITopics | heavy-tailed behavior

0525a72df7fb2cd943c780d059b94774-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 09:06:12 GMT

offline sgd, sgd, tail index, (17 more...)

Neural Information Processing Systems

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Neural Information Processing SystemsDec-23-2025, 18:12:52 GMT

A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our contribution aims to fill this gap. In particular, we show that the stationary distribution of offline (also called multi-pass) SGD exhibits'approximate' power-law tails and the approximation error is controlled by how fast the empirical distribution of the training data converges to the true underlying data distribution in the Wasserstein metric. Our main takeaway is that, as the number of data points increases, offline SGD will behave increasingly'power-law-like'. To achieve this result, we first prove nonasymptotic Wasserstein convergence bounds for offline SGD to online SGD as the number of data points increases, which can be interesting on their own. Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0525a72df7fb2cd943c780d059b94774-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 00:33:58 GMT

artificial intelligence, machine learning, offline sgd, (19 more...)

Neural Information Processing Systems

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Neural Information Processing SystemsOct-9-2024, 09:23:34 GMT

A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our contribution aims to fill this gap.

approximate heavy tail, multi-pass, stochastic gradient descent, (4 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Emergence of heavy tails in homogenized stochastic gradient descent

Jiao, Zhe, Keller-Ressel, Martin

arXiv.org Artificial IntelligenceFeb-2-2024

An important step in this direction It has repeatedly been observed that loss minimization by has been taken in Gurbuzbalaban et al. [2021], where the tail stochastic gradient descent leads to heavy-tailed distributions behavior of SGD iterates is characterized in dependence on of neural network parameters. Here, we analyze a continuous optimization parameters, dimension and Hessian curvature diffusion approximation of SGD, called homogenized stochastic at the loss minimum. One limitation of Gurbuzbalaban et al. gradient descent, show that it behaves asymptotically [2021] is that this link is described only qualitatively, but heavy-tailed, and give explicit upper and lower bounds on not quantitatively. Here, we provide an alternative approach its tail-index. We validate these bounds in numerical experiments through analyzing homogenized stochastic gradient descent, and show that they are typically close approximations a diffusion approximation of SGD introduced in Paquette to the empirical tail-index of SGD iterates.

approximation, gradient descent, stochastic gradient descent, (14 more...)

arXiv.org Artificial Intelligence

2402.01382

Country:

Europe > Germany > Saxony > Dresden (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Pavasovic, Krunoslav Lehman, Durmus, Alain, Simsekli, Umut

arXiv.org Machine LearningOct-27-2023

A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our contribution aims to fill this gap. In particular, we show that the stationary distribution of offline (also called multi-pass) SGD exhibits 'approximate' power-law tails and the approximation error is controlled by how fast the empirical distribution of the training data converges to the true underlying data distribution in the Wasserstein metric. Our main takeaway is that, as the number of data points increases, offline SGD will behave increasingly 'power-law-like'. To achieve this result, we first prove nonasymptotic Wasserstein convergence bounds for offline SGD to online SGD as the number of data points increases, which can be interesting on their own. Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.

artificial intelligence, machine learning, offline sgd, (19 more...)

arXiv.org Machine Learning

2310.18455

Country: Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

Fat- and Heavy-Tailed Behavior in Satisficing Planning

Cohen, Eldan (University of Toronto) | Beck, J. Christopher (University of Toronto)

AAAI ConferencesFeb-8-2018

In this work, we study the runtime distribution of satisficing planning in ensembles of random planning problems and in multiple runs of a randomized heuristic search on a single planning instance. Using common heuristic functions (such as FF) and six benchmark problem domains from the IPC, we find a heavy-tailed behavior, similar to that found in CSP and SAT. We investigate two notions of constrainedness, often used in the modeling of planning problems, and show that the heavy-tailed behavior tends to appear in relatively relaxed problems, where the required effort is, on average, low. Finally, we show that as with randomized restarts in CSP and SAT solving, recent search enhancements that incorporate randomness in the search process can help mitigate the effect of the heavy tail.

artificial intelligence, heavy-tailed behavior, planning & scheduling, (16 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.47)

Technology: