AITopics | Dana, Léo

Collaborating Authors

Dana, Léo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convergence of Shallow ReLU Networks on Weakly Interacting Data

Dana, Léo, Bach, Francis, Pillaud-Vivien, Loucas

arXiv.org Machine LearningFeb-24-2025

Understanding the properties of models used in machine learning is crucial for providing guarantees to downstream users. Of particular importance, the convergence of the training process under gradient methods stands as one of the first issues to address in order to comprehend them. If, on the one hand, such a question for linear models and convex optimization problems (Bottou et al., 2018; Bach, 2024) are well understood, this is not the case for neural networks, which are the most used models in large-scale machine learning. This paper focuses on providing quantitative convergence guarantees for a one-hidden-layer neural network. Theoretically, such global convergence analysis of neural networks has seen two main achievements in the past years: (i) the identification of the lazy regime, due to a particular initialization, where convergence is always guaranteed at the cost of being essentially a linear model (Jacot et al., 2018; Arora et al., 2019; Chizat et al., 2019), and (ii) the proof that with an infinite amount of hidden units a two-layer neural network converges towards the global minimizer of the loss (Mei et al., 2018; Chizat and Bach, 2018; Rotskoff and Vanden-Eijnden, 2018). However, neural networks are trained in practice outside of these regimes, as neural networks are known to perform feature learning, and experimentally reach global minimum with a large but finite number of neurons. Quantifying in which regimes neural networks converge to a global minimum of their loss is still an important open question. 1

artificial intelligence, initialization, machine learning, (16 more...)

arXiv.org Machine Learning

2502.16977

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.63)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Memorization in Attention-only Transformers

Dana, Léo, Pydi, Muni Sreenivas, Chevaleyre, Yann

arXiv.org Artificial IntelligenceNov-15-2024

Recent research has explored the memorization capacity of multi-head attention, but these findings are constrained by unrealistic limitations on the context size. We present a novel proof for language-based Transformers that extends the current hypothesis to any context size. Our approach improves upon the state-of-the-art by achieving more effective exact memorization with an attention layer, while also introducing the concept of approximate memorization of distributions. Through experimental validation, we demonstrate that our proposed bounds more accurately reflect the true memorization capacity of language models, and provide a precise comparison with prior work.

artificial intelligence, machine learning, transformer, (18 more...)

arXiv.org Artificial Intelligence

2411.10115

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback