AITopics | Mallik, Neeratyoy

Collaborating Authors

Mallik, Neeratyoy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Warmstarting for Scaling Language Models

Mallik, Neeratyoy, Janowski, Maciej, Hog, Johannes, Rakotoarison, Herilalaina, Klein, Aaron, Grabocka, Josif, Hutter, Frank

arXiv.org Artificial IntelligenceNov-11-2024

Scaling model sizes to scale performance has worked remarkably well for the current large language models paradigm. The research and empirical findings of various scaling studies led to novel scaling results and laws that guides subsequent research. High training costs for contemporary scales of data and models result in a lack of thorough understanding of how to tune and arrive at such training setups. One direction to ameliorate the cost of pretraining large models is to warmstart the large-scale training from smaller models that are cheaper to tune. In this work, we attempt to understand if the behavior of optimal hyperparameters can be retained under warmstarting for scaling. We explore simple operations that allow the application of theoretically motivated methods of zero-shot transfer of optimal hyperparameters using {\mu}Transfer. We investigate the aspects that contribute to the speedup in convergence and the preservation of stable training dynamics under warmstarting with {\mu}Transfer. We find that shrinking smaller model weights, zero-padding, and perturbing the resulting larger model with scaled initialization from {\mu}P enables effective warmstarting of $\mut{}$.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.0734

Country: Europe > Germany > Baden-Württemberg (0.28)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

Rakotoarison, Herilalaina, Adriaensen, Steven, Mallik, Neeratyoy, Garibov, Samir, Bergman, Edward, Hutter, Frank

arXiv.org Artificial IntelligenceJun-7-2024

With the increasing computational costs associated with deep learning, automated hyperparameter optimization methods, strongly relying on black-box Bayesian optimization (BO), face limitations. Freeze-thaw BO offers a promising grey-box alternative, strategically allocating scarce resources incrementally to different configurations. However, the frequent surrogate model updates inherent to this approach pose challenges for existing methods, requiring retraining or fine-tuning their neural network surrogates online, introducing overhead, instability, and hyper-hyperparameters. In this work, we propose FT-PFN, a novel surrogate for Freeze-thaw style BO. FT-PFN is a prior-data fitted network (PFN) that leverages the transformers' in-context learning ability to efficiently and reliably do Bayesian learning curve extrapolation in a single forward pass. Our empirical analysis across three benchmark suites shows that the predictions made by FT-PFN are more accurate and 10-100 times faster than those of the deep Gaussian process and deep ensemble surrogates used in previous work. Furthermore, we show that, when combined with our novel acquisition mechanism (MFPI-random), the resulting in-context freeze-thaw BO method (ifBO), yields new state-of-the-art performance in the same three families of deep learning HPO benchmarks considered in prior work.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Artificial Intelligence

2404.16795

Country:

Europe > Germany > Baden-Württemberg (0.28)
Europe > Austria (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Fast Benchmarking of Asynchronous Multi-Fidelity Optimization on Zero-Cost Benchmarks

Watanabe, Shuhei, Mallik, Neeratyoy, Bergman, Edward, Hutter, Frank

arXiv.org Artificial IntelligenceApr-17-2024

While deep learning has celebrated many successes, its results often hinge on the meticulous selection of hyperparameters (HPs). However, the time-consuming nature of deep learning training makes HP optimization (HPO) a costly endeavor, slowing down the development of efficient HPO tools. While zero-cost benchmarks, which provide performance and runtime without actual training, offer a solution for non-parallel setups, they fall short in parallel setups as each worker must communicate its queried runtime to return its evaluation in the exact order. This work addresses this challenge by introducing a user-friendly Python package that facilitates efficient parallel HPO with zero-cost benchmarks. Our approach calculates the exact return order based on the information stored in file system, eliminating the need for long waiting times and enabling much faster HPO evaluations. We first verify the correctness of our approach through extensive testing and the experiments with 6 popular HPO libraries show its applicability to diverse libraries and its ability to achieve over 1000x speedup compared to a traditional approach. Our package can be installed via pip install mfhpo-simulator.

artificial intelligence, machine learning, optimization problem, (21 more...)

arXiv.org Artificial Intelligence

2403.01888

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

PriorBand: Practical Hyperparameter Optimization in the Age of Deep Learning

Mallik, Neeratyoy, Bergman, Edward, Hvarfner, Carl, Stoll, Danny, Janowski, Maciej, Lindauer, Marius, Nardi, Luigi, Hutter, Frank

arXiv.org Artificial IntelligenceNov-15-2023

Hyperparameters of Deep Learning (DL) pipelines are crucial for their downstream performance. While a large number of methods for Hyperparameter Optimization (HPO) have been developed, their incurred costs are often untenable for modern DL. Consequently, manual experimentation is still the most prevalent approach to optimize hyperparameters, relying on the researcher's intuition, domain knowledge, and cheap preliminary explorations. To resolve this misalignment between HPO algorithms and DL researchers, we propose PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs and cheap proxy tasks. Empirically, we demonstrate PriorBand's efficiency across a range of DL benchmarks and show its gains under informative expert input and robustness against poor expert beliefs.

machine learning, natural language, priorband, (19 more...)

arXiv.org Artificial Intelligence

2306.1237

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Spain (0.14)
Europe > Portugal (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Squirrel: A Switching Hyperparameter Optimizer

Awad, Noor, Shala, Gresa, Deng, Difan, Mallik, Neeratyoy, Feurer, Matthias, Eggensperger, Katharina, Biedenkapp, Andre', Vermetten, Diederick, Wang, Hao, Doerr, Carola, Lindauer, Marius, Hutter, Frank

arXiv.org Machine LearningDec-16-2020

In this short note, we describe our submission to the NeurIPS 2020 BBO challenge. Motivated by the fact that different optimizers work well on different problems, our approach switches between different optimizers. Since the team names on the competition's leaderboard were randomly generated "alliteration nicknames", consisting of an adjective and an animal with the same initial letter, we called our approach the Switching Squirrel, or here, short, Squirrel. The challenge mandated to suggest 16 successive batches of 8 hyperparameter configurations at a time. We chose to only use one optimizer for a given batch, warmstarted with all previous observations.

artificial intelligence, configuration, optimization problem, (16 more...)

arXiv.org Machine Learning

2012.0818

Country: Europe (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.31)

Add feedback