AITopics | vershynin

Collaborating Authors

vershynin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix

Neural Information Processing SystemsApr-25-2026, 00:30:40 GMT

In this section, we provide background information in probability theory, and focus on random processes and concentration of measure inequalities.

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sharp Concentration Inequalities: Phase Transition and Mixing of Orlicz Tails with Variance

Shen, Yinan, Lv, Jinchi

arXiv.org Machine LearningMar-30-2026

In this work, we investigate how to develop sharp concentration inequalities for sub-Weibull random variables, including sub-Gaussian and sub-exponential distributions. Although the random variables may not be sub-Guassian, the tail probability around the origin behaves as if they were sub-Gaussian, and the tail probability decays align with the Orlicz $Ψ_α$-tail elsewhere. Specifically, for independent and identically distributed (i.i.d.) $\{X_i\}_{i=1}^n$ with finite Orlicz norm $\|X\|_{Ψ_α}$, our theory unveils that there is an interesting phase transition at $α= 2$ in that $\PPł(ł|\sum_{i=1}^n X_i \r| \geq t\r)$ with $t > 0$ is upper bounded by $2\expł(-C\maxł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $α\geq 2$, and by $2\expł(-C\minł\{\frac{t^2}{n\|X\|_{Ψ_α}^2},\frac{t^α}{ n^{α-1} \|X\|_{Ψ_α}^α}\r\}\r)$ for $1\leq α\leq 2$ with some positive constant $C$. In many scenarios, it is often necessary to distinguish the standard deviation from the Orlicz norm when the latter can exceed the former greatly. To accommodate this, we build a new theoretical analysis framework, and our sharp, flexible concentration inequalities involve the variance and a mixing of Orlicz $Ψ_α$-tails through the min and max functions. Our theory yields new, improved concentration inequalities even for the cases of sub-Gaussian and sub-exponential distributions with $α= 2$ and $1$, respectively. We further demonstrate our theory on martingales, random vectors, random matrices, and covariance matrix estimation. These sharp concentration inequalities can empower more precise non-asymptotic analyses across different statistical and machine learning applications.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2603.25934

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Higher Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Modeling & Simulation (0.54)

Add feedback

An Improved Analysis of Alternating Minimization for Structured Multi-Response Regression

Sheng Chen, Arindam Banerjee

Neural Information Processing SystemsFeb-12-2026, 21:35:30 GMT

Due to the non-convex nature of such joint estimators, the theoretical justification of their efficiency is typically challenging.

artificial intelligence, initialization, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

ky Xvk

Neural Information Processing SystemsFeb-11-2026, 02:46:23 GMT

Wefocusonsixmethods:(i)discriminative K-means (DisKmeans) in Ye et al. (2008); (ii) a discriminative clustering formulation described inBach andHarchaoui (2008); Flammarion etal.(2017); We compare two classesF of feature mappings: linear functions and fully-connected neural networks with one hidden layer that has 100 nodes. An epoch refers ton/B = 12 consecutive iterations. The learning curves in Figure 1 shows the advantage of neural network and demonstrates the flexibility of CURE with nonlinear function classes. One of the main obstacles is the complicated piecewise definition off, which prevent us from obtaining closed form formulae.

artificial intelligence, machine learning, pgd, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

b0928f2d4ba7ea33b05024f21d937f48-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 17:46:23 GMT

We demonstrate this by deriving an upper bound on theRademacher Complexitythatdepends ontwokeyquantities: (i)theintrinsic dimension, which is a measure of isotropy, and (ii) the largest eigenvalue of the second moment (covariance) matrix ofthe distribution.

artificial intelligence, complexity, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

69dd2eff9b6a421d5ce262b093bdab23-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 04:30:45 GMT

Quite interestingly,modern deep networks havebeen known to be powerful enough to interpolate even randomized labels (Zhang et al., 2017; Liu et al., 2020), a phenomenon that is usually referred to asmemorization(Yunetal.,2019; Vershynin,2020;Bubeck etal.,2020), In this work, we lift the spherical requirement and offer an exponential improvementonthedependenceonδ.

artificial intelligence, machine learning, vershynin, (18 more...)

Neural Information Processing Systems

Country:

Europe > France (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

536eecee295b92db6b32194e269541f8-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 17:06:02 GMT

There exists constantsC1,C2,C3,C4 depending onψ such that the following holds true.

artificial intelligence, machine learning, probability, (19 more...)

Neural Information Processing Systems

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

31784d9fc1fa0d25d04eae50ac9bf787-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 03:26:56 GMT

Indeedin learning applications, where symmetric tensors areformed from statistical moments (higher-order covariances) or multivariate derivatives (higher-order Hessians), CP decomposition has enabled parameter estimation for mixtures of Gaussians [20, 35], generalized linear models [34], shallow neuralnetworks[19,24,42],deepernetworks[17,18,30],hiddenMarkovmodels[5],amongothers.

artificial intelligence, machine learning, tensor, (19 more...)

Neural Information Processing Systems

Country: Africa > Senegal > Kolda Region > Kolda (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Appendix

Neural Information Processing SystemsFeb-7-2026, 20:05:00 GMT

LetY1,...,YN be independent mean-zero sub-Gaussianrandomvariables.

artificial intelligence, lipl, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

Add feedback

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

Kovačević, Filip, Ji, Hong Chang, Wu, Denny, Soltanolkotabi, Mahdi, Mondelli, Marco

arXiv.org Machine LearningFeb-3-2026

It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. However, beyond linear regression, the theoretical advantage of full-batch gradient descent (GD, which always reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) remains unclear. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.

artificial intelligence, machine learning, probability, (15 more...)

arXiv.org Machine Learning

2602.02431

Country: