AITopics | Archambeau, Cedric

Collaborating Authors

Archambeau, Cedric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Structural Pruning of Pre-trained Language Models via Neural Architecture Search

Klein, Aaron, Golebiowski, Jacek, Ma, Xingchen, Perrone, Valerio, Archambeau, Cedric

arXiv.org Artificial IntelligenceMay-3-2024

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in real-world applications, due to significant GPU memory requirements and high inference latency. This paper explores neural architecture search (NAS) for structural pruning to find sub-parts of the fine-tuned network that optimally trade-off efficiency, for example in terms of model size or latency, and generalization performance. We also show how we can utilize more recently developed two-stage weight-sharing NAS approaches in this setting to accelerate the search process. Unlike traditional pruning methods with fixed thresholds, we propose to adopt a multi-objective approach that identifies the Pareto optimal set of sub-networks, allowing for a more flexible and automated compression process.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.02267

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Negative Result on Gradient Matching for Selective Backprop

Balles, Lukas, Archambeau, Cedric, Zappella, Giovanni

arXiv.org Artificial IntelligenceDec-8-2023

With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain a loss value for each data point in a minibatch. The backward pass is then restricted to a subset of that minibatch, prioritizing high-loss examples. We build on this approach, but seek to improve the subset selection mechanism by choosing the (weighted) subset which best matches the mean gradient over the entire minibatch. We use the gradients w.r.t. the model's last layer as a cheap proxy, resulting in virtually no overhead in addition to the forward pass. At the same time, for our experiments we add a simple random selection baseline which has been absent from prior work. Surprisingly, we find that both the loss-based as well as the gradient-matching strategy fail to consistently outperform the random baseline.

artificial intelligence, batch size, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.05021

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing Hyperparameters with Conformal Quantile Regression

Salinas, David, Golebiowski, Jacek, Klein, Aaron, Seeger, Matthias, Archambeau, Cedric

arXiv.org Artificial IntelligenceMay-5-2023

Many state-of-the-art hyperparameter optimization (HPO) algorithms rely on model-based optimizers that learn surrogate models of the target function to guide the search. Gaussian processes are the de facto surrogate model due to their ability to capture uncertainty but they make strong assumptions about the observation noise, which might not be warranted in practice. In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks. To apply our method in a multi-fidelity setting, we propose a simple, yet effective, technique that aggregates observed results across different resource levels and outperforms conventional methods across many empirical tasks.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2305.03623

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Renate: A Library for Real-World Continual Learning

Wistuba, Martin, Ferianc, Martin, Balles, Lukas, Archambeau, Cedric, Zappella, Giovanni

arXiv.org Artificial IntelligenceApr-24-2023

Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in practical machine learning deployment. This paper presents Renate, a continual learning library designed to build real-world updating pipelines for PyTorch models. We discuss requirements for the use of continual learning algorithms in practice, from which we derive design principles for Renate. We give a high-level description of the library components and interfaces. Finally, we showcase the strengths of the library by presenting experimental results. Renate may be found at https://github.com/awslabs/renate.

artificial intelligence, machine learning, renate, (16 more...)

arXiv.org Artificial Intelligence

2304.12067

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Fortuna: A Library for Uncertainty Quantification in Deep Learning

Detommaso, Gianluca, Gasparin, Alberto, Donini, Michele, Seeger, Matthias, Wilson, Andrew Gordon, Archambeau, Cedric

arXiv.org Artificial IntelligenceFeb-8-2023

Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved uncertainty quantification and accuracy. By providing a coherent framework for advanced uncertainty quantification methods, Fortuna simplifies the process of benchmarking and helps practitioners build robust AI systems.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2302.04019

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Memory Efficient Continual Learning with Transformers

Ermis, Beyza, Zappella, Giovanni, Wistuba, Martin, Rawal, Aditya, Archambeau, Cedric

arXiv.org Artificial IntelligenceJan-13-2023

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computational resources that can be used might be limited. Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since the resources or data might not be available in sufficiently large quantities to practitioners to train the model from scratch. In this paper, we devise a method to incrementally train a model on a sequence of tasks using pre-trained Transformers and extending them with Adapters. Different than the existing approaches, our method is able to scale to a large number of tasks without significant overhead and allows sharing information across tasks. On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods.

artificial intelligence, machine learning, memory efficient continual learning, (1 more...)

arXiv.org Artificial Intelligence

2203.0464

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Private Synthetic Data for Multitask Learning and Marginal Queries

Vietri, Giuseppe, Archambeau, Cedric, Aydore, Sergul, Brown, William, Kearns, Michael, Roth, Aaron, Siva, Ankit, Tang, Shuai, Wu, Zhiwei Steven

arXiv.org Artificial IntelligenceSep-15-2022

We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.

artificial intelligence, machine learning, multitask learning and marginal query, (1 more...)

arXiv.org Artificial Intelligence

2209.074

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.40)

Add feedback

Meta-Forecasting by combining Global Deep Representations with Local Adaptation

Grazzi, Riccardo, Flunkert, Valentin, Salinas, David, Januschowski, Tim, Seeger, Matthias, Archambeau, Cedric

arXiv.org Artificial IntelligenceNov-12-2021

While classical time series forecasting considers individual time series in isolation, recent advances based on deep learning showed that jointly learning from a large pool of related time series can boost the forecasting accuracy. However, the accuracy of these methods suffers greatly when modeling out-of-sample time series, significantly limiting their applicability compared to classical forecasting methods. To bridge this gap, we adopt a meta-learning view of the time series forecasting problem. We introduce a novel forecasting method, called Meta Global-Local Auto-Regression (Meta-GLAR), that adapts to each time series by learning in closed-form the mapping from the representations produced by a recurrent neural network (RNN) to one-step-ahead forecasts. Crucially, the parameters of the RNN are learned across multiple time series by backpropagating through the closed-form adaptation mechanism. In our extensive empirical evaluation we show that our method is competitive with the state-of-the-art in out-of-sample forecasting accuracy reported in earlier work.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2111.03418

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Overfitting in Bayesian Optimization: an empirical study and early-stopping solution

Makarova, Anastasia, Shen, Huibin, Perrone, Valerio, Klein, Aaron, Faddoul, Jean Baptiste, Krause, Andreas, Seeger, Matthias, Archambeau, Cedric

arXiv.org Artificial IntelligenceApr-16-2021

Bayesian Optimization (BO) is a successful methodology to tune the hyperparameters of machine learning algorithms. The user defines a metric of interest, such as the validation error, and BO finds the optimal hyperparameters that minimize it. However, the metric improvements on the validation set may not translate to the test set, especially on small datasets. In other words, BO can overfit. While cross-validation mitigates this, it comes with high computational cost. In this paper, we carry out the first systematic investigation of overfitting in BO and demonstrate that this is a serious yet often overlooked concern in practice. We propose the first problem-adaptive and interpretable criterion to early stop BO, reducing overfitting while mitigating the cost of cross-validation. Experimental results on real-world hyperparameter optimization tasks show that our approach can substantially reduce compute time with little to no loss of test accuracy,demonstrating a clear practical advantage over existing techniques.

artificial intelligence, optimization problem, variance, (17 more...)

arXiv.org Artificial Intelligence

2104.08166

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

BORE: Bayesian Optimization by Density-Ratio Estimation

Tiao, Louis C., Klein, Aaron, Seeger, Matthias, Bonilla, Edwin V., Archambeau, Cedric, Ramos, Fabio

arXiv.org Machine LearningFeb-17-2021

Bayesian optimization (BO) is among the most effective and widely-used blackbox optimization methods. BO proposes solutions according to an explore-exploit trade-off criterion encoded in an acquisition function, many of which are computed from the posterior predictive of a probabilistic surrogate model. Prevalent among these is the expected improvement (EI) function. The need to ensure analytical tractability of the predictive often poses limitations that can hinder the efficiency and applicability of BO. In this paper, we cast the computation of EI as a binary classification problem, building on the link between class-probability estimation and density-ratio estimation, and the lesser-known link between density-ratios and EI. By circumventing the tractability constraints, this reformulation provides numerous advantages, not least in terms of expressiveness, versatility, and scalability.

bayesian inference, neural network, optimization, (16 more...)

arXiv.org Machine Learning

2102.09009

Country:

Oceania > Australia (0.14)
North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback