AITopics

2403.00025

Country:

Europe > Germany (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Media (0.93)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

arXiv.org Artificial IntelligenceDec-12-2023

Leveraging sparse and shared feature activations for disentangled representation learning

Fumero, Marco, Wenzel, Florian, Zancato, Luca, Achille, Alessandro, Rodolà, Emanuele, Soatto, Stefano, Schölkopf, Bernhard, Locatello, Francesco

Research on recovering the latent factors of variation of high dimensional data has so far focused on simple synthetic settings. Mostly building on unsupervised and weakly-supervised objectives, prior work missed out on the positive implications for representation learning on real world data. In this work, we propose to leverage knowledge extracted from a diversified set of supervised tasks to learn a common disentangled representation. Assuming that each supervised task only depends on an unknown subset of the factors of variation, we disentangle the feature space of a supervised multi-task model, with features activating sparsely across different tasks and information being shared as appropriate. Importantly, we never directly observe the factors of variations, but establish that access to multiple tasks is sufficient for identifiability under sufficiency and minimality assumptions. We validate our approach on six real world distribution shift benchmarks, and different data modalities (images, text), demonstrating how disentangled representations can be transferred to real settings.

artificial intelligence, machine learning, natural language, (16 more...)

2304.07939

Country:

Europe (1.00)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-30-2023

Image retrieval outperforms diffusion models on data augmentation

Burg, Max F., Wenzel, Florian, Zietlow, Dominik, Horn, Max, Makansi, Osama, Locatello, Francesco, Russell, Chris

Many approaches have been proposed to use diffusion models to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large datasets, often with noisy annotations, and it remains an open question to which extent these models contribute to downstream classification performance. In particular, it remains unclear if they generalize enough to improve over directly using the additional data of their pre-training process for augmentation. We systematically evaluate a range of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. Personalizing diffusion models towards the target data outperforms simpler prompting strategies. However, using the pre-training data of the diffusion model alone, via a simple nearest-neighbor retrieval procedure, leads to even stronger downstream performance. Our study explores the potential of diffusion models in generating new training data, and surprisingly finds that these sophisticated models are not yet able to beat a simple and strong image retrieval baseline on simple downstream vision tasks.

artificial intelligence, diffusion model, machine learning, (18 more...)

2304.10253

Country:

Europe > Germany (0.29)
North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceOct-18-2023

Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

Ali, Junaid, Kleindessner, Matthaeus, Wenzel, Florian, Budhathoki, Kailash, Cevher, Volkan, Russell, Chris

We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP), that are used for labeling tasks. We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning. We categorize desired behaviors based around three axes: (i) if the task concerns humans; (ii) how subjective the task is (i.e., how likely it is that people from a diverse range of backgrounds would agree on a labeling); and (iii) the intended purpose of the task and if fairness is better served by impartiality (i.e., making decisions independent of the protected attributes) or representation (i.e., making decisions to maximize diversity). Finally, we provide quantitative fairness evaluations for both binary-valued and multi-valued protected attributes over ten diverse datasets. We find that fair PCA, a post-processing method for fair representations, works very well for debiasing in most of the aforementioned tasks while incurring only minor loss of performance. However, different debiasing approaches vary in their effectiveness depending on the task. Hence, one should choose the debiasing approach depending on the specific use case.

discriminative foundation model, large language model, natural language, (3 more...)

doi: 10.1145/3600211.3604720

2310.11867

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

arXiv.org Artificial IntelligenceJun-19-2023

Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries

Loh, Charlotte, Han, Seungwook, Sudalairaj, Shivchander, Dangovski, Rumen, Xu, Kai, Wenzel, Florian, Soljacic, Marin, Srivastava, Akash

Deep ensembles (DE) have been successful in improving model performance by learning diverse members via the stochasticity of random initialization. While recent works have attempted to promote further diversity in DE via hyperparameters or regularizing loss functions, these methods primarily still rely on a stochastic approach to explore the hypothesis space. In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters. We leverage recent advances in contrastive representation learning to create models that separately capture opposing hypotheses of invariant and equivariant functional classes and present a simple ensembling approach to efficiently combine appropriate hypotheses for a given task. We show that MSE effectively captures the multiplicity of conflicting hypotheses that is often required in large, diverse datasets like ImageNet. As a result of their inherent diversity, MSE improves classification performance, uncertainty quantification, and generalization across a series of transfer tasks.

artificial intelligence, hypothesis, machine learning, (15 more...)

2303.02484

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Machine LearningOct-7-2021

Sparse MoEs meet Efficient Ensembles

Allingham, James Urquhart, Wenzel, Florian, Mariet, Zelda E, Mustafa, Basil, Puigcerver, Joan, Houlsby, Neil, Jerfel, Ghassen, Fortuin, Vincent, Lakshminarayanan, Balaji, Snoek, Jasper, Tran, Dustin, Ruiz, Carlos Riquelme, Jenatton, Rodolphe

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, lead to strong performance. We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs). First, we show that these two approaches have complementary features whose combination is beneficial. Then, we present partitioned batch ensembles, an efficient ensemble of sparse MoEs that takes the best of both classes of models. Extensive experiments on fine-tuned vision transformers demonstrate the accuracy, log-likelihood, few-shot learning, robustness, and uncertainty calibration improvements of our approach over several challenging baselines. Partitioned batch ensembles not only scale to models with up to 2.7B parameters, but also provide larger performance gains for larger models.

artificial intelligence, machine learning, neural network, (19 more...)

2110.0336

Country:

Europe (0.28)
Oceania > Australia (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

arXiv.org Machine LearningOct-6-2021

Deep Classifiers with Label Noise Modeling and Distance Awareness

Fortuin, Vincent, Collier, Mark, Wenzel, Florian, Allingham, James, Liu, Jeremiah, Tran, Dustin, Lakshminarayanan, Balaji, Berent, Jesse, Jenatton, Rodolphe, Kokiopoulou, Effrosyni

Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications. While there have been many proposed methods that either focus on distance-aware model uncertainties for out-of-distribution detection or on input-dependent label uncertainties for in-distribution calibration, both of these types of uncertainty are often necessary. In this work, we propose the HetSNGP method for jointly modeling the model and data uncertainty. We show that our proposed model affords a favorable combination between these two complementary types of uncertainty and thus outperforms the baseline methods on some challenging out-of-distribution datasets, including CIFAR-100C, Imagenet-C, and Imagenet-A. Moreover, we propose HetSNGP Ensemble, an ensembled version of our method which adds an additional type of uncertainty and also outperforms other ensemble baselines.

artificial intelligence, machine learning, neural network, (16 more...)

2110.02609

Country:

Europe (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Machine LearningJun-22-2021

On Stein Variational Neural Network Ensembles

D'Angelo, Francesco, Fortuin, Vincent, Wenzel, Florian

Ensembles of deep neural networks have achieved great success recently, but they do not offer a proper Bayesian justification. Moreover, while they allow for averaging of predictions over several hypotheses, they do not provide any guarantees for their diversity, leading to redundant solutions in function space. In contrast, particle-based inference methods, such as Stein variational gradient descent (SVGD), offer a Bayesian framework, but rely on the choice of a kernel to measure the similarity between ensemble members. In this work, we study different SVGD methods operating in the weight space, function space, and in a hybrid setting. We compare the SVGD approaches to other ensembling-based methods in terms of their theoretical properties and assess their empirical performance on synthetic and real-world tasks. We find that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles. It improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely. Moreover, we show that using stochastic SVGD updates, as opposed to the standard deterministic ones, can further improve the performance.

arxiv preprint arxiv, bayesian inference, neural network, (17 more...)

2106.1076

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

arXiv.org Machine LearningFeb-12-2021

Bayesian Neural Network Priors Revisited

Fortuin, Vincent, Garriga-Alonso, Adrià, Wenzel, Florian, Rätsch, Gunnar, Turner, Richard, van der Wilk, Mark, Aitchison, Laurence

In a Bayesian neural network (BNN), we specify a prior p(w) over the neural network parameters, and compute the posterior distribution over parameters conditioned on training data, p(w x, y) p(y w, x)p(w)/p(y x). This procedure should give considerable advantages for reasoning about predictive uncertainty, which is especially relevant in the small-data setting. Crucially, to perform Bayesian inference, we need to choose a prior that accurately reflects our beliefs about the parameters before seeing any data (Bayes, 1763; Gelman et al., 2013). However, the most common choice of the prior for BNN weights is the simplest one: the isotropic Gaussian. Isotropic Gaussians are used across almost all fields of Bayesian deep learning, ranging from variational inference (Blundell et al., 2015; Dusenberry et al., 2020), to sampling-based inference (Zhang et al., 2019), and even to infinite networks (Lee et al., 2017; Garriga-Alonso et al., 2019). This is troubling, since isotropic Gaussian priors are almost certainly not the best choice. Indeed, despite the progress on more accurate and efficient inference procedures, in most settings, the posterior predictive of BNNs using a Gaussian prior still leads to worse predictive performance than a baseline obtained by training the network with standard stochastic gradient descent (SGD) (e.g., Zhang et al., 2019; Heek & Kalchbrenner, 2019; Wenzel et al., 2020a). However, it has been shown that the performance of BNNs can be improved by artificially reducing posterior uncertainty using "cold posteriors" (Wenzel et al., 2020a).

cold posterior effect, deep learning, neural network, (15 more...)

2102.06571

Country:

Europe (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

arXiv.org Machine LearningOct-26-2020

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

Wenzel, Florian, Snoek, Jasper, Tran, Dustin, Jenatton, Rodolphe

Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration. The recently introduced batch ensembles provide a drop-in replacement that is more parameter efficient. In this paper, we design ensembles not only over weights, but over hyperparameters to improve the state of the art in both settings. For best performance independent of budget, we propose hyper-deep ensembles, a simple procedure that involves a random search over different hyperparameters, themselves stratified across multiple random initializations. Its strong performance highlights the benefit of combining models with both weight and hyperparameter diversity. We further propose a parameter efficient version, hyper-batch ensembles, which builds on the layer structure of batch ensembles and self-tuning networks. The computational and memory costs of our method are notably lower than typical ensembles. On image classification tasks, with MLP, LeNet, ResNet 20 and Wide ResNet 28-10 architectures, we improve upon both deep and batch ensembles.

deep learning, ensemble, neural network, (21 more...)

2006.1357

Country: North America > Canada (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)