AITopics | martingale difference

Collaborating Authors

martingale difference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

abd987257ff0eddc2bc6602538cb3c43-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 05:48:13 GMT

apple, martingale difference, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

abd987257ff0eddc2bc6602538cb3c43-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 17:55:32 GMT

apple, martingale difference, variance, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Deviation inequalities for stochastic approximation by averaging

Fan, Xiequan, Alquier, Pierre, Doukhan, Paul

arXiv.org Machine LearningFeb-17-2021

A large amount of probability inequalities under dependence may be found in the literature, see [13] and more recently [15], [17] as well as in [24], [25], [6], [7], [11], or [12]. Many papers involve inequalities for Markov chains and recent martingale based techniques provide reasonable ones for contractive Markov chains as in [8]; such contractive Markov chains are weakly dependent. The above references mainly correspond to the time homogeneous contractive cases, and we aim at proving results for time non-homogeneous Markov chains. This is the setting of the large class of models introduced in Section 1.1. Different situations of stochastic algorithms [19] and unit roots [20] correspond to such varying contraction coefficients tending either to 0 or to 1 as n . Several relevant models fitting such conditions are considered in Section 1.2.

inequality, lemma 2, proposition 3, (13 more...)

arXiv.org Machine Learning

2102.08685

Country:

Europe > France > Île-de-France > Yvelines > Cergy-Pontoise (0.04)
Europe > France > Île-de-France > Val-d'Oise > Cergy-Pontoise (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Networks and the Multiple Manifold Problem

Buchanan, Sam, Gilboa, Dar, Wright, John

arXiv.org Machine LearningAug-25-2020

We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width $n$ grows as a sufficiently large polynomial in $L$, and the number of i.i.d. samples from the manifolds is polynomial in $L$, randomly-initialized gradient descent rapidly learns to classify the two manifolds perfectly with high probability. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients. The argument centers around the neural tangent kernel and its role in the nonasymptotic analysis of training overparameterized neural networks; to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width $n \gtrsim L\,\mathrm{poly}(d_0)$ to achieve uniform concentration of the initial kernel over a $d_0$-dimensional submanifold of the unit sphere $\mathbb{S}^{n_0-1}$, and a nonasymptotic framework for establishing generalization of networks trained in the NTK regime with structured data. The proof makes heavy use of martingale concentration to optimally treat statistical dependencies across layers of the initial random network. This approach should be of use in establishing similar results for other network architectures.

artificial intelligence, machine learning, order statistics, (19 more...)

arXiv.org Machine Learning

2008.11245

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > India (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback