AITopics | bottou

Collaborating Authors

bottou

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reducing Noisein GANTrainingwith Variance Reduced Extragradient

Neural Information Processing SystemsFeb-12-2026, 06:36:32 GMT

W = 2for = 10 2and = 10 4, omitted architectures.

artificial intelligence, bach, lacoste-julien, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.05)
Europe > Switzerland > Vaud > Lausanne (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback

These Are Not All the Features You Are Looking For: A Fundamental Bottleneck in Supervised Pretraining

Yang, Xingyu Alice, Zhang, Jianyu, Bottou, Léon

arXiv.org Machine LearningJun-27-2025

Transfer learning is a cornerstone of modern machine learning, promising a way to adapt models pretrained on a broad mix of data to new tasks with minimal new data. However, a significant challenge remains in ensuring that transferred features are sufficient to handle unseen datasets, amplified by the difficulty of quantifying whether two tasks are "related". To address these challenges, we evaluate model transfer from a pretraining mixture to each of its component tasks, assessing whether pretrained features can match the performance of task-specific direct training. We identify a fundamental limitation in deep learning models -- an "information saturation bottleneck" -- where networks fail to learn new features once they encode similar competing features during training. When restricted to learning only a subset of key features during pretraining, models will permanently lose critical features for transfer and perform inconsistently on data distributions, even components of the training mixture. Empirical evidence from published studies suggests that this phenomenon is pervasive in deep learning architectures -- factors such as data distribution or ordering affect the features that current representation learning methods can learn over time. This study suggests that relying solely on large-scale networks may not be as effective as focusing on task-specific training, when available. We propose richer feature representations as a potential solution to better generalize across new datasets and, specifically, present existing methods alongside a novel approach, the initial steps towards addressing this challenge.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

2506.18221

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fine-tuning with Very Large Dropout

Zhang, Jianyu, Bottou, Léon

arXiv.org Artificial IntelligenceMar-1-2024

It is impossible today to pretend that the practice of machine learning is compatible with the idea that training and testing data follow the same distribution. Several authors have recently used ensemble techniques to show how scenarios involving multiple data distributions are best served by representations that are both richer than those obtained by regularizing for the best in-distribution performance, and richer than those obtained under the influence of the implicit sparsity bias of common stochastic gradient procedures. This contribution investigates the use of very high dropout rates instead of ensembles to obtain such rich representations. Although training a deep network from scratch using such dropout rates is virtually impossible, fine-tuning a large pre-trained model under such conditions is not only possible but also achieves out-of-distribution performances that exceed those of both ensembles and weight averaging methods such as model soups. This result has practical significance because the importance of the fine-tuning scenario has considerably grown in recent years. This result also provides interesting insights on the nature of rich representations and on the intrinsically linear nature of fine-tuning a large network using a comparatively small dataset.

dropout, fine-tuning, representation, (13 more...)

arXiv.org Artificial Intelligence

2403.00946

Country:

North America > United States > New York (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Perceptrons, Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou: An Introduction to Computational Geometry (The MIT Press): Minsky, Marvin, Papert, Seymour A., Bottou, Leon: 9780262534772: Amazon.com: Books

#artificialintelligenceAug-4-2022, 09:50:31 GMT

Perceptrons, Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou: An Introduction to Computational Geometry (The MIT Press) [Minsky, Marvin, Papert, Seymour A., Bottou, Leon] on Amazon.com. *FREE* shipping on qualifying offers. Perceptrons, Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou: An Introduction to Computational Geometry (The MIT Press)

bottou, computational geometry, new foreword, (15 more...)

#artificialintelligence

Industry: Retail > Online (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.80)

Add feedback

The Paradigm Shift of Self-Supervised Learning

#artificialintelligenceMay-23-2019, 20:42:36 GMT

"If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake." By 2016, Yann LeCun began to hedge with his use of the term "unsupervised learning". In NIPS 2016, he started to call it in even more nebulous terms "predictive learning": I have always had trouble with the use of the term "Unsupervised Learning". In 2017, I had predicted that Unsupervised Learning will not progress much and said "there seems to be a massive conceptual disconnect as to how exactly it should work" and that it was the "dark matter" of machine learning.

artificial intelligence, inductive learning, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.73)

Add feedback

From GAN to WGAN

@machinelearnbotFeb-5-2018, 17:19:45 GMT

This post explains the maths behind a generative adversarial network (GAN) model and why it is hard to be trained. Wasserstein GAN is intended to improve GANs' training by adopting a smooth metric for measuring the distance between two probability distributions. Generative adversarial network (GAN) has shown great results in many generative tasks to replicate the real-world rich content such as images, human language, and music. It is inspired by game theory: two models, a generator and a critic, are competing with each other while making each other stronger at the same time. However, it is rather challenging to train a GAN model, as people are facing issues like training instability or failure to converge. Here I would like to explain the maths behind the generative adversarial network framework, why it is hard to be trained, and finally introduce a modified version of GAN intended to solve the training difficulties.

artificial intelligence, discriminator, machine learning, (17 more...)

@machinelearnbot

Industry: Leisure & Entertainment > Games (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Facebook's Quest to Build an Artificial Brain Depends on This Guy

AITopics Original LinksJan-18-2017, 12:08:10 GMT

Mark Zuckerberg recently handpicked the longtime NYU professor to run Facebook's new artificial intelligence lab. The IEEE Computational Intelligence Society just gave him its prestigious Neural Network Pioneer Award, in honor of his work on deep learning, a form of artificial intelligence meant to more closely mimic the human brain. And, perhaps most of all, deep learning has suddenly spread across the commercial tech world, from Google to Microsoft to Baidu to Twitter, just a few years after most AI researchers openly scoffed at it. All of these tech companies are now exploring a particular type of deep learning called convolutional neural networks, aiming to build web services that can do things like automatically understand natural language and recognize images. At China's Baidu, they drive a new visual search engine.

artificial intelligence, lecun, machine learning, (15 more...)

AITopics Original Links

Country:

Asia > China (0.24)
Europe > France (0.05)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Personal (0.34)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Counterfactual Reasoning and Learning Systems

Bottou, Léon, Peters, Jonas, Quiñonero-Candela, Joaquin, Charles, Denis X., Chickering, D. Max, Portugaly, Elon, Ray, Dipankar, Simard, Patrice, Snelson, Ed

arXiv.org Artificial IntelligenceJul-27-2013

This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select changes that improve both the short-term and long-term performance of such systems. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1209.2355

Country:

North America > United States (0.67)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine (1.00)
Marketing (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Convergence Properties of the K-Means Algorithms

Bottou, Léon, Bengio, Yoshua

Neural Information Processing SystemsDec-31-1995

K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.

algorithm, k-means, prototype, (13 more...)

Neural Information Processing Systems

Country: