AITopics | statement follow

Collaborating Authors

statement follow

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Rate of Gaussian Approximation for Linear Regression Problems

Khusainov, Marat, Sheshukova, Marina, Durmus, Alain, Samsonov, Sergey

arXiv.org Machine LearningSep-18-2025

In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate upon the problem dimension $d$ and quantities related to the design matrix. When the number of iterations $n$ is known in advance, our results yield the rate of normal approximation of order $\sqrt{\log{n}/n}$, provided that the sample size $n$ is large enough.

approximation, recurrence, stochastic approximation, (13 more...)

arXiv.org Machine Learning

2509.14039

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.61)

Add feedback

Pruning is Optimal for Learning Sparse Features in High-Dimensions

Vural, Nuri Mert, Erdogdu, Murat A.

arXiv.org Machine LearningJun-12-2024

While it is commonly observed in practice that pruning networks to a certain level of sparsity can improve the quality of the features, a theoretical explanation of this phenomenon remains elusive. In this work, we investigate this by demonstrating that a broad class of statistical models can be optimally learned using pruned neural networks trained with gradient descent, in high-dimensions. We consider learning both single-index and multi-index models of the form $y = \sigma^*(\boldsymbol{V}^{\top} \boldsymbol{x}) + \epsilon$, where $\sigma^*$ is a degree-$p$ polynomial, and $\boldsymbol{V} \in \mathbbm{R}^{d \times r}$ with $r \ll d$, is the matrix containing relevant model directions. We assume that $\boldsymbol{V}$ satisfies a certain $\ell_q$-sparsity condition for matrices and show that pruning neural networks proportional to the sparsity level of $\boldsymbol{V}$ improves their sample complexity compared to unpruned networks. Furthermore, we establish Correlational Statistical Query (CSQ) lower bounds in this setting, which take the sparsity level of $\boldsymbol{V}$ into account. We show that if the sparsity level of $\boldsymbol{V}$ exceeds a certain threshold, training pruned networks with a gradient descent algorithm achieves the sample complexity suggested by the CSQ lower bound. In the same scenario, however, our results imply that basis-independent methods such as models trained via standard gradient descent initialized with rotationally invariant random weights can provably achieve only suboptimal sample complexity.

neural network, nullv, statement follow, (15 more...)

arXiv.org Machine Learning

2406.08658

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(6 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates

Grazzi, Riccardo, Pontil, Massimiliano, Salzo, Saverio

arXiv.org Machine LearningMar-28-2024

Important examples are given by hyperparameter optimization and meta-learning (Franceschi et al., 2018; Lee et al., 2019), where (1) expresses the optimality conditions of a lower-level minimization problem. Further examples include learning a surrogate model for data poisoning attacks (Xiao et al., 2015; Muñoz-González et al., 2017), deep equilibrium models (Bai et al., 2019) or OptNet (Amos & Kolter, 2017). All these problems may present nonsmooth mappings Φ. For instance, consider hyperparameter optimization or data poisoning attacks for SVMs, or meta-learning for image classification, where Φ is evaluated through the forward pass of a neural net with RELU activations (Bertinetto et al., 2019; Lee et al., 2019; Rajeswaran et al., 2019). In addition, when such settings are applied to large datasets, evaluating the map Φ would be too costly, but we can usually apply stochastic methods through the composite stochastic structure in (2), where only T involves a computation on the full training set (e.g., a gradient descent step).

conservative derivative, differentiation, optimization, (13 more...)

arXiv.org Machine Learning

2403.11687

Country:

Europe > Italy (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Convergence Properties of Stochastic Hypergradients

Grazzi, Riccardo, Pontil, Massimiliano, Salzo, Saverio

arXiv.org Machine LearningNov-13-2020

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems in the design of optimization algorithms for bilevel optimization is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. We provide iteration complexity bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. Preliminary numerical experiments support our theoretical analysis.

assumption, convergence property, hypergradient, (14 more...)

arXiv.org Machine Learning

2011.07122

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Unsupervised model-free representation learning

Ryabko, Daniil

arXiv.org Machine LearningDec-25-2019

Numerous control and learning problems face the situation where sequences of high-dimensional highly dependent data are available but no or little feedback is provided to the learner, which makes any inference rather challenging. To address this challenge, we formulate the following problem. Given a series of observations $X_0,\dots,X_n$ coming from a large (high-dimensional) space $\mathcal X$, find a representation function $f$ mapping $\mathcal X$ to a finite space $\mathcal Y$ such that the series $f(X_0),\dots,f(X_n)$ preserves as much information as possible about the original time-series dependence in $X_0,\dots,X_n$. We show that, for stationary time series, the function $f$ can be selected as the one maximizing a certain information criterion that we call time-series information. Some properties of this functions are investigated, including its uniqueness and consistency of its empirical estimates. Implications for the problem of optimal control are presented.

conditionally independent, information, representation, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/TIT.2019.2961814

1304.4806

Country:

North America > United States (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback