AITopics | Fortuin, Vincent

Collaborating Authors

Fortuin, Vincent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pathologies in priors and inference for Bayesian transformers

Cinquin, Tristan, Immer, Alexander, Horn, Max, Fortuin, Vincent

arXiv.org Machine LearningOct-15-2021

In recent years, the transformer has established itself as a workhorse in many applications ranging from natural language processing to reinforcement learning. Similarly, Bayesian deep learning has become the gold-standard for uncertainty estimation in safety-critical applications, where robustness and calibration are crucial. Surprisingly, no successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist. In this work, we study this curiously underpopulated area of Bayesian transformers. We find that weight-space inference in transformers does not work well, regardless of the approximate posterior. We also find that the prior is at least partially at fault, but that it is very hard to find well-specified weight priors for these models. We hypothesize that these problems stem from the complexity of obtaining a meaningful mapping from weight-space to function-space distributions in the transformer. Therefore, moving closer to function-space, we propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights. We find that this proposed method performs competitively with our baselines.

diagnostic medicine, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2110.0402

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Genre: Research Report (0.84)

Industry: Health & Medicine > Diagnostic Medicine (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(2 more...)

Add feedback

Sparse MoEs meet Efficient Ensembles

Allingham, James Urquhart, Wenzel, Florian, Mariet, Zelda E, Mustafa, Basil, Puigcerver, Joan, Houlsby, Neil, Jerfel, Ghassen, Fortuin, Vincent, Lakshminarayanan, Balaji, Snoek, Jasper, Tran, Dustin, Ruiz, Carlos Riquelme, Jenatton, Rodolphe

arXiv.org Machine LearningOct-7-2021

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that these two approaches have complementary features whose combination is beneficial. Then, we present partitioned batch ensembles, an efficient ensemble of sparse MoEs that takes the best of both classes of models. Extensive experiments on fine-tuned vision transformers demonstrate the accuracy, log-likelihood, few-shot learning, robustness, and uncertainty calibration improvements of our approach over several challenging baselines. Partitioned batch ensembles not only scale to models with up to 2.7B parameters, but also provide larger performance gains for larger models.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Machine Learning

2110.0336

Country:

Europe (0.28)
Oceania > Australia (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Deep Classifiers with Label Noise Modeling and Distance Awareness

Fortuin, Vincent, Collier, Mark, Wenzel, Florian, Allingham, James, Liu, Jeremiah, Tran, Dustin, Lakshminarayanan, Balaji, Berent, Jesse, Jenatton, Rodolphe, Kokiopoulou, Effrosyni

arXiv.org Machine LearningOct-6-2021

Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications. While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or on input-dependent label uncertainties for in-distribution calibration, both of these types of uncertainty are often necessary. In this work, we propose the HetSNGP method for jointly modeling the model and data uncertainty. We show that our proposed model affords a favorable combination between these two complementary types of uncertainty and thus outperforms the baseline methods on some challenging out-of-distribution datasets, including CIFAR-100C, Imagenet-C, and Imagenet-A. Moreover, we propose HetSNGP Ensemble, an ensembled version of our method which adds an additional type of uncertainty and also outperforms other ensemble baselines.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2110.02609

Country:

Europe (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Neural Variational Gradient Descent

di Langosco, Lauro Langosco, Fortuin, Vincent, Strathmann, Heiko

arXiv.org Machine LearningJul-29-2021

Particle-based approximate Bayesian inference approaches such as Stein Variational Gradient Descent (SVGD) combine the flexibility and convergence guarantees of sampling methods with the computational benefits of variational inference. In practice, SVGD relies on the choice of an appropriate kernel function, which impacts its ability to model the target distribution -- a challenging problem with only heuristic solutions. We propose Neural Variational Gradient Descent (NVGD), which is based on parameterizing the witness function of the Stein discrepancy by a deep neural network whose parameters are learned in parallel to the inference, mitigating the necessity to make any kernel choices whatsoever. We empirically evaluate our method on popular synthetic inference problems, real-world Bayesian linear regression, and Bayesian neural network inference.

bayesian inference, neural network, stein discrepancy, (16 more...)

arXiv.org Machine Learning

2107.10731

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.52)

Add feedback

Repulsive Deep Ensembles are Bayesian

D'Angelo, Francesco, Fortuin, Vincent

arXiv.org Machine LearningJul-25-2021

Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.

artificial intelligence, machine learning, repulsive deep ensemble, (6 more...)

arXiv.org Machine Learning

2106.11642

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

A Bayesian Approach to Invariant Deep Neural Networks

Mourdoukoutas, Nikolaos, Federici, Marco, Pantalos, Georges, van der Wilk, Mark, Fortuin, Vincent

arXiv.org Machine LearningJul-20-2021

Contributions We propose a method to learn such weight-sharing schemes from data. As a proof of concept, we focus on being invariant We propose a novel Bayesian neural network architecture to two types of transformations applied on images, that can learn invariances from data namely rotations and flips. However, our algorithm can be alone by inferring a posterior distribution over applied to any other choice of symmetry, as long as the corresponding different weight-sharing schemes. We show that weight-sharing scheme is available. Apart from our model outperforms other non-invariant architectures, achieving good performance during inference, our model is when trained on datasets that contain able to learn such invariances from data. This is achieved by specific invariances. The same holds true when specifying a probability distribution over the weight-sharing no data augmentation is performed.

deep learning, invariance, neural network, (17 more...)

arXiv.org Machine Learning

2107.09301

Country:

North America > United States > New York (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.83)

Add feedback

On Stein Variational Neural Network Ensembles

D'Angelo, Francesco, Fortuin, Vincent, Wenzel, Florian

arXiv.org Machine LearningJun-22-2021

Ensembles of deep neural networks have achieved great success recently, but they do not offer a proper Bayesian justification. Moreover, while they allow for averaging of predictions over several hypotheses, they do not provide any guarantees for their diversity, leading to redundant solutions in function space. In contrast, particle-based inference methods, such as Stein variational gradient descent (SVGD), offer a Bayesian framework, but rely on the choice of a kernel to measure the similarity between ensemble members. In this work, we study different SVGD methods operating in the weight space, function space, and in a hybrid setting. We compare the SVGD approaches to other ensembling-based methods in terms of their theoretical properties and assess their empirical performance on synthetic and real-world tasks. We find that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles. It improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely. Moreover, we show that using stochastic SVGD updates, as opposed to the standard deterministic ones, can further improve the performance.

arxiv preprint arxiv, bayesian inference, neural network, (17 more...)

arXiv.org Machine Learning

2106.1076

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Data augmentation in Bayesian neural networks and the cold posterior effect

Nabarro, Seth, Ganev, Stoil, Garriga-Alonso, Adrià, Fortuin, Vincent, van der Wilk, Mark, Aitchison, Laurence

arXiv.org Machine LearningJun-10-2021

Data augmentation is a highly effective approach for improving performance in deep neural networks. The standard view is that it creates an enlarged dataset by adding synthetic data, which raises a problem when combining it with Bayesian inference: how much data are we really conditioning on? This question is particularly relevant to recent observations linking data augmentation to the cold posterior effect. We investigate various principled ways of finding a log-likelihood for augmented datasets. Our approach prescribes augmenting the same underlying image multiple times, both at test and train-time, and averaging either the logits or the predictive probabilities. Empirically, we observe the best performance with averaging probabilities. While there are interactions with the cold posterior effect, neither averaging logits or averaging probabilities eliminates it.

augmentation, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

2106.05586

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

BNNpriors: A library for Bayesian neural network inference with different prior distributions

Fortuin, Vincent, Garriga-Alonso, Adrià, van der Wilk, Mark, Aitchison, Laurence

arXiv.org Machine LearningMay-14-2021

Bayesian neural networks have shown great promise in many applications where calibrated uncertainty estimates are crucial and can often also lead to a higher predictive performance. However, it remains challenging to choose a good prior distribution over their weights. While isotropic Gaussian priors are often chosen in practice due to their simplicity, they do not reflect our true prior beliefs well and can lead to suboptimal performance. Our new library, BNNpriors, enables state-of-the-art Markov Chain Monte Carlo inference on Bayesian neural networks with a wide range of predefined priors, including heavy-tailed ones, hierarchical ones, and mixture priors. Moreover, it follows a modular approach that eases the design and implementation of new custom priors. It has facilitated foundational discoveries on the nature of the cold posterior effect in Bayesian neural networks and will hopefully catalyze future research as well as practical applications in this area.

deep learning, library, neural network, (13 more...)

arXiv.org Machine Learning

doi: 10.1016/j.simpa.2021.100079

2105.06964

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback

Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

Immer, Alexander, Bauer, Matthias, Fortuin, Vincent, Rätsch, Gunnar, Khan, Mohammad Emtiyaz

arXiv.org Machine LearningMay-11-2021

Marginal-likelihood based model-selection, even though promising, is rarely used in deep learning due to estimation difficulties. Instead, most approaches rely on validation data, which may not be readily available. In this work, we present a scalable marginal-likelihood estimation method to select both the hyperparameters and network architecture based on the training data alone. Some hyperparameters can be estimated online during training, simplifying the procedure. Our marginal-likelihood estimate is based on Laplace's method and Gauss-Newton approximations to the Hessian, and it outperforms cross-validation and manual-tuning on standard regression and image classification datasets, especially in terms of calibration and out-of-distribution detection. Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable (e.g., in nonstationary settings).

approximation, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2104.04975

Country:

North America > United States > Arizona (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Add feedback