AITopics | Klein, Aaron

Collaborating Authors

Klein, Aaron

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hyperband-based Bayesian Optimization for Black-box Prompt Selection

Schneider, Lennart, Wistuba, Martin, Klein, Aaron, Golebiowski, Jacek, Zappella, Giovanni, Merra, Felice Antonio

arXiv.org Artificial IntelligenceDec-10-2024

Optimal prompt selection is crucial for maximizing large language model (LLM) performance on downstream tasks. As the most powerful models are proprietary and can only be invoked via an API, users often manually refine prompts in a black-box setting by adjusting instructions and few-shot examples until they achieve good performance as measured on a validation set. Recent methods addressing static black-box prompt selection face significant limitations: They often fail to leverage the inherent structure of prompts, treating instructions and few-shot exemplars as a single block of text. Moreover, they often lack query-efficiency by evaluating prompts on all validation instances, or risk sub-optimal selection of a prompt by using random subsets of validation instances. We introduce HbBoPs, a novel Hyperband-based Bayesian optimization method for black-box prompt selection addressing these key limitations. Our approach combines a structural-aware deep kernel Gaussian Process to model prompt performance with Hyperband as a multi-fidelity scheduler to select the number of validation instances for prompt evaluations. The structural-aware modeling approach utilizes separate embeddings for instructions and few-shot exemplars, enhancing the surrogate model's ability to capture prompt performance and predict which prompt to evaluate next in a sample-efficient manner. Together with Hyperband as a multi-fidelity scheduler we further enable query-efficiency by adaptively allocating resources across different fidelity levels, keeping the total number of validation instances prompts are evaluated on low. Extensive evaluation across ten benchmarks and three LLMs demonstrate that HbBoPs outperforms state-of-the-art methods.

large language model, machine learning, validation error, (16 more...)

arXiv.org Artificial Intelligence

2412.0782

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Warmstarting for Scaling Language Models

Mallik, Neeratyoy, Janowski, Maciej, Hog, Johannes, Rakotoarison, Herilalaina, Klein, Aaron, Grabocka, Josif, Hutter, Frank

arXiv.org Artificial IntelligenceNov-11-2024

Scaling model sizes to scale performance has worked remarkably well for the current large language models paradigm. The research and empirical findings of various scaling studies led to novel scaling results and laws that guides subsequent research. High training costs for contemporary scales of data and models result in a lack of thorough understanding of how to tune and arrive at such training setups. One direction to ameliorate the cost of pretraining large models is to warmstart the large-scale training from smaller models that are cheaper to tune. In this work, we attempt to understand if the behavior of optimal hyperparameters can be retained under warmstarting for scaling. We explore simple operations that allow the application of theoretically motivated methods of zero-shot transfer of optimal hyperparameters using {\mu}Transfer. We investigate the aspects that contribute to the speedup in convergence and the preservation of stable training dynamics under warmstarting with {\mu}Transfer. We find that shrinking smaller model weights, zero-padding, and perturbing the resulting larger model with scaled initialization from {\mu}P enables effective warmstarting of $\mut{}$.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.0734

Country: Europe > Germany > Baden-Württemberg (0.28)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Large Language Model Compression with Neural Architecture Search

Sukthanker, Rhea Sanjay, Staffler, Benedikt, Hutter, Frank, Klein, Aaron

arXiv.org Artificial IntelligenceNov-4-2024

Large language models (LLMs) exhibit remarkable reasoning abilities, allowing them to generalize across a wide range of downstream tasks, such as commonsense reasoning or instruction following. However, as LLMs scale, inference costs become increasingly prohibitive, accumulating significantly over their life cycle. This poses the question: Can we compress pre-trained LLMs to meet diverse size and latency requirements? We leverage Neural Architecture Search (NAS) to compress LLMs by pruning structural components, such as attention heads, neurons, and layers, aiming to achieve a Pareto-optimal balance between performance and efficiency. While NAS already achieved promising results on small language models in previous work, in this paper we propose various extensions that allow us to scale to LLMs. Compared to structural pruning baselines, we show that NAS improves performance up to 3.4% on MMLU with an on-device latency speedup.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.06479

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Hyperparameter Optimization in Machine Learning

Franceschi, Luca, Donini, Michele, Perrone, Valerio, Klein, Aaron, Archambeau, Cédric, Seeger, Matthias, Pontil, Massimiliano, Frasconi, Paolo

arXiv.org Machine LearningOct-30-2024

Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determine the effectiveness of systems based on these technologies. Manual hyperparameter search is often unsatisfactory and becomes unfeasible when the number of hyperparameters is large. Automating the search is an important step towards automating machine learning, freeing researchers and practitioners alike from the burden of finding a good set of hyperparameters by trial and error. In this survey, we present a unified treatment of hyperparameter optimization, providing the reader with examples and insights into the state-of-the-art. We cover the main families of techniques to automate hyperparameter search, often referred to as hyperparameter optimization or tuning, including random and quasi-random search, bandit-, model- and gradient- based approaches. We further discuss extensions, including online, constrained, and multi-objective formulations, touch upon connections with other fields such as meta-learning and neural architecture search, and conclude with open questions and future research directions.

evolutionary algorithm, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2410.22854

Country:

Europe (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Information Technology (0.67)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(6 more...)

Add feedback

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

Sukthanker, Rhea Sanjay, Zela, Arber, Staffler, Benedikt, Klein, Aaron, Purucker, Lennart, Franke, Joerg K. H., Hutter, Frank

arXiv.org Artificial IntelligenceJun-21-2024

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training and evaluation on multiple devices. To address this, we introduce HW-GPT-Bench, a hardware-aware benchmark that utilizes surrogate predictions to approximate various hardware metrics across 13 devices of architectures in the GPT-2 family, with architectures containing up to 774M parameters. Our surrogates, via calibrated predictions and reliable uncertainty estimates, faithfully model the heteroscedastic noise inherent in the energy and latency measurements. To estimate perplexity, we employ weight-sharing techniques from Neural Architecture Search (NAS), inheriting pretrained weights from the largest GPT-2 model. Finally, we demonstrate the utility of HW-GPT-Bench by simulating optimization trajectories of various multi-objective optimization algorithms in just a few seconds.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.10299

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report > Experimental Study (0.92)

Industry: Energy (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

Klein, Aaron, Golebiowski, Jacek, Ma, Xingchen, Perrone, Valerio, Archambeau, Cedric

arXiv.org Artificial IntelligenceMay-3-2024

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency, for example in terms of model size or latency, and generalization performance. We also show how we can utilize more recently developed two-stage weight-sharing NAS approaches in this setting to accelerate the search process. Unlike traditional pruning methods with fixed thresholds, we propose to adopt a multi-objective approach that identifies the Pareto optimal set of sub-networks, allowing for a more flexible and automated compression process.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.02267

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation

Hellan, Sigrid Passano, Shen, Huibin, Aubet, François-Xavier, Salinas, David, Klein, Aaron

arXiv.org Artificial IntelligenceJun-29-2023

We introduce ordered transfer hyperparameter optimisation (OTHPO), a version of transfer learning for hyperparameter optimisation (HPO) where the tasks follow a sequential order. Unlike for state-of-the-art transfer HPO, the assumption is that each task is most correlated to those immediately before it. This matches many deployed settings, where hyperparameters are retuned as more data is collected; for instance tuning a sequence of movie recommendation systems as more movies and ratings are added. We propose a formal definition, outline the differences to related problems and propose a basic OTHPO method that outperforms state-of-the-art transfer HPO. We empirically show the importance of taking order into account using ten benchmarks. The benchmarks are in the setting of gradually accumulating data, and span XGBoost, random forest, approximate k-nearest neighbor, elastic net, support vector machines and a separate real-world motivated optimisation problem. We open source the benchmarks to foster future research on ordered transfer HPO.

artificial intelligence, configuration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.16916

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Optimizing Hyperparameters with Conformal Quantile Regression

Salinas, David, Golebiowski, Jacek, Klein, Aaron, Seeger, Matthias, Archambeau, Cedric

arXiv.org Artificial IntelligenceMay-5-2023

Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their ability to capture uncertainty but they make strong assumptions about the observation noise, which might not be warranted in practice. In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks. To apply our method in a multi-fidelity setting, we propose a simple, yet effective, technique that aggregates observed results across different resource levels and outperforms conventional methods across many empirical tasks.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.03623

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Overfitting in Bayesian Optimization: an empirical study and early-stopping solution

Makarova, Anastasia, Shen, Huibin, Perrone, Valerio, Klein, Aaron, Faddoul, Jean Baptiste, Krause, Andreas, Seeger, Matthias, Archambeau, Cedric

arXiv.org Artificial IntelligenceApr-16-2021

Bayesian Optimization (BO) is a successful methodology to tune the hyperparameters of machine learning algorithms. The user defines a metric of interest, such as the validation error, and BO finds the optimal hyperparameters that minimize it. However, the metric improvements on the validation set may not translate to the test set, especially on small datasets. In other words, BO can overfit. While cross-validation mitigates this, it comes with high computational cost. In this paper, we carry out the first systematic investigation of overfitting in BO and demonstrate that this is a serious yet often overlooked concern in practice. We propose the first problem-adaptive and interpretable criterion to early stop BO, reducing overfitting while mitigating the cost of cross-validation. Experimental results on real-world hyperparameter optimization tasks show that our approach can substantially reduce compute time with little to no loss of test accuracy,demonstrating a clear practical advantage over existing techniques.

artificial intelligence, optimization problem, variance, (17 more...)

arXiv.org Artificial Intelligence

2104.08166

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Hyperparameter Transfer Learning with Adaptive Complexity

Horváth, Samuel, Klein, Aaron, Richtárik, Peter, Archambeau, Cédric

arXiv.org Machine LearningFeb-25-2021

Bayesian optimization (BO) is a sample efficient approach to automatically tune the hyperparameters of machine learning models. In practice, one frequently has to solve similar hyperparameter tuning problems sequentially. For example, one might have to tune a type of neural network learned across a series of different classification problems. Recent work on multi-task BO exploits knowledge gained from previous tuning tasks to speed up a new tuning task. However, previous approaches do not account for the fact that BO is a sequential decision making procedure. Hence, there is in general a mismatch between the number of evaluations collected in the current tuning task compared to the number of evaluations accumulated in all previously completed tasks. In this work, we enable multi-task BO to compensate for this mismatch, such that the transfer learning procedure is able to handle different data regimes in a principled way. We propose a new multi-task BO method that learns a set of ordered, non-linear basis functions of increasing complexity via nested drop-out and automatic relevance determination. Experiments on a variety of hyperparameter tuning problems show that our method improves the sample ef

basis function, neural network, survey article, (16 more...)

arXiv.org Machine Learning

2102.1281

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback