AITopics | Sadrtdinov, Ildus

Collaborating Authors

Sadrtdinov, Ildus

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Where Do Large Learning Rates Lead Us?

Sadrtdinov, Ildus, Kodryan, Maxim, Pokonechny, Eduard, Lobacheva, Ekaterina, Vetrov, Dmitry

arXiv.org Machine LearningOct-29-2024

It is generally accepted that starting neural networks training with large learning rates (LRs) improves generalization. Following a line of research devoted to understanding this effect, we conduct an empirical study in a controlled setting focusing on two questions: 1) how large an initial LR is required for obtaining optimal quality, and 2) what are the key differences between models trained with different LRs? We discover that only a narrow range of initial LRs slightly above the convergence threshold lead to optimal results after fine-tuning with a small LR or weight averaging. By studying the local geometry of reached minima, we observe that using LRs from this optimal range allows for the optimization to locate a basin that only contains high-quality minima. Additionally, we show that these initial LRs result in a sparse set of learned features, with a clear focus on those most relevant for the task. In contrast, starting training with too small LRs leads to unstable minima and attempts to learn all features simultaneously, resulting in poor generalization. Conversely, using initial LRs that are too large fails to detect a basin with good solutions and extract meaningful patterns from the data.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2410.22113

Country: Asia (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

To Stay or Not to Stay in the Pre-train Basin: Insights on Ensembling in Transfer Learning

Sadrtdinov, Ildus, Pozdeev, Dmitrii, Vetrov, Dmitry, Lobacheva, Ekaterina

arXiv.org Machine LearningNov-3-2023

Transfer learning and ensembling are two popular techniques for improving the performance and robustness of neural networks. Due to the high cost of pre-training, ensembles of models fine-tuned from a single pre-trained checkpoint are often used in practice. Such models end up in the same basin of the loss landscape, which we call the pre-train basin, and thus have limited diversity. In this work, we show that ensembles trained from a single pre-trained checkpoint may be improved by better exploring the pre-train basin, however, leaving the basin results in losing the benefits of transfer learning and in degradation of the ensemble quality. Based on the analysis of existing exploration methods, we propose a more effective modification of the Snapshot Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger ensembles and uniform model soups.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2303.03374

Country:

Asia (0.14)
Europe (0.14)

Genre: Research Report (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Memorization Properties of Contrastive Learning

Sadrtdinov, Ildus, Chirkova, Nadezhda, Lobacheva, Ekaterina

arXiv.org Machine LearningJul-21-2021

However, data labeling is often time-consuming and costly, as it involves human expertise. Thus, it is common for computer vision to pretrain DNNs vate improvements to DNN training approaches. A pioneer on some large labeled dataset, e. g. ImageNet (Russakovsky work of Zhang et al. (2017) showed that the capacity of et al., 2015), and then to fine-tune the model to a specific modern DNNs is sufficient to fit perfectly even randomly downstream task. The self-supervised learning paradigm labeled data. According to classic learning theory, such a provides a human labeling-free alternative to the supervised huge capacity should lead to catastrophic overfitting, however, pretraining: recently developed contrastive self-supervised recent works (Nakkiran et al., 2020) show that in methods show results, comparable to ImageNet pretraining practice increasing DNN capacity further improves generalization.

augmentation, inductive learning, neural network, (18 more...)

arXiv.org Machine Learning

2107.10143

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback