AITopics | Möllenhoff, Thomas

Uncertainty-Aware Decoding with Minimum Bayes Risk

Daheim, Nico, Meister, Clara, Möllenhoff, Thomas, Gurevych, Iryna

arXiv.org Artificial IntelligenceMar-7-2025

Despite their outstanding performance in the majority of scenarios, contemporary language models still occasionally generate undesirable outputs, for example, hallucinated text. While such behaviors have previously been linked to uncertainty, there is a notable lack of methods that actively consider uncertainty during text generation. In this work, we show how Minimum Bayes Risk (MBR) decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method. In short, we account for model uncertainty during decoding by incorporating a posterior over model parameters into MBR's computation of expected risk. We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead. We benchmark different methods for learning posteriors and show that performance improves with prediction diversity. We release our code publicly.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.05318

Country:

Asia (1.00)
Europe > Germany (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Add feedback

Natural Variational Annealing for Multimodal Optimization

Minh, Tâm Le, Arbel, Julyan, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz, Forbes, Florence

arXiv.org Machine LearningJan-8-2025

We introduce a new multimodal optimization approach called Natural Variational Annealing (NVA) that combines the strengths of three foundational concepts to simultaneously search for multiple global and local modes of black-box nonconvex objectives. First, it implements a simultaneous search by using variational posteriors, such as, mixtures of Gaussians. Second, it applies annealing to gradually trade off exploration for exploitation. Finally, it learns the variational search distribution using natural-gradient learning where updates resemble well-known and easy-to-implement algorithms. The three concepts come together in NVA giving rise to new algorithms and also allowing us to incorporate "fitness shaping", a core concept from evolutionary algorithms. We assess the quality of search on simulations and compare them to methods using gradient descent and evolution strategies. We also provide an application to a real-world inverse problem in planetary science.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Machine Learning

2501.04667

Country:

Europe > France (0.14)
Asia > Japan (0.14)

Genre: Research Report > Promising Solution (0.45)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging

Maldonado, Hugo Monzón, Möllenhoff, Thomas, Daheim, Nico, Gurevych, Iryna, Khan, Mohammad Emtiyaz

arXiv.org Machine LearningDec-11-2024

When finetuning multiple tasks altogether, it is important to carefully weigh them to get a good performance, but searching for good weights can be difficult and costly. Here, we propose to aid the search with fast previews to quickly get a rough idea of different reweighting options. We use model merging to create previews by simply reusing and averaging parameters of models trained on each task separately (no retraining required). To improve the quality of previews, we propose a Bayesian approach to design new merging strategies by using more flexible posteriors. We validate our findings on vision and natural-language transformers. Our work shows the benefits of model merging via Bayes to improve multitask finetuning.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2412.08147

Country:

North America > United States (0.14)
Europe > Germany (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Variational Low-Rank Adaptation Using IVON

Cong, Bai, Daheim, Nico, Shen, Yuesong, Cremers, Daniel, Yokota, Rio, Khan, Mohammad Emtiyaz, Möllenhoff, Thomas

arXiv.org Machine LearningNov-9-2024

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models.

ivon, large language model, machine learning, (16 more...)

arXiv.org Machine Learning

2411.04421

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Conformal Prediction via Regression-as-Classification

Guha, Etash, Natarajan, Shlok, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz, Ndiaye, Eugene

arXiv.org Machine LearningApr-11-2024

Conformal prediction (CP) for regression can be challenging, especially when the output distribution is heteroscedastic, multimodal, or skewed. Some of the issues can be addressed by estimating a distribution over the output, but in reality, such approaches can be sensitive to estimation error and yield unstable intervals.~Here, we circumvent the challenges by converting regression to a classification problem and then use CP for classification to obtain CP sets for regression.~To preserve the ordering of the continuous-output space, we design a new loss function and make necessary modifications to the CP classification techniques.~Empirical results on many benchmarks shows that this simple approach gives surprisingly good results on many practical problems.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2404.08168

Country:

Europe > Middle East > Cyprus (0.14)
Europe > Finland (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Variational Learning is Effective for Large Deep Networks

Shen, Yuesong, Daheim, Nico, Cong, Bai, Nickl, Peter, Marconi, Gian Maria, Bazan, Clement, Yokota, Rio, Gurevych, Iryna, Cremers, Daniel, Khan, Mohammad Emtiyaz, Möllenhoff, Thomas

arXiv.org Machine LearningFeb-27-2024

Laplace (MacKay, 1992), which do not directly optimize the variational objective, even though they have variational We give extensive empirical evidence against the interpretations. Ideally, we want to know whether a direct common belief that variational learning is ineffective optimization of the objective can match the accuracy of for large neural networks. We show that Adam-like methods without any increase in the cost, while an optimizer called Improved Variational Online also yielding good weight-uncertainty to improve calibration, Newton (IVON) consistently matches or outperforms model averaging, knowledge transfer, etc. Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational In this paper, we present the Improved Variational Online costs are nearly identical to Adam but Newton (IVON) method, which adapts the method of Lin its predictive uncertainty is better. We show several et al. (2020) to large scale and obtains state-of-the-art accuracy new use cases of IVON where we improve and uncertainty at nearly identical cost as Adam. Figure 1 fine-tuning and model merging in Large Language shows some examples where, for training GPT-2 (773M Models, accurately predict generalization error, parameters) from scratch, IVON gives 0.4 reduction in validation and faithfully estimate sensitivity to data. We find perplexity over AdamW and, for ResNet-50 (25.6M overwhelming evidence in support of effectiveness parameters) on ImageNet, it gives around 2% more accurate of variational learning.

ivon, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2402.17641

Country:

Europe > Germany (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SAM as an Optimal Relaxation of Bayes

Möllenhoff, Thomas, Khan, Mohammad Emtiyaz

arXiv.org Machine LearningDec-10-2023

Sharpness-aware minimization (SAM) and related adversarial deep-learning methods can drastically improve generalization, but their underlying mechanisms are not yet fully understood. Here, we establish SAM as a relaxation of the Bayes objective where the expected negative-loss is replaced by the optimal convex lower bound, obtained by using the so-called Fenchel biconjugate. The connection enables a new Adam-like extension of SAM to automatically obtain reasonable uncertainty estimates, while sometimes also improving its accuracy. By connecting adversarial and Bayesian methods, our work opens a new path to robustness.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2210.0162

Country: Asia > Japan (0.14)

Genre: Research Report > New Finding (0.93)

Add feedback

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

Nickl, Peter, Xu, Lu, Tailor, Dharmesh, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz

arXiv.org Machine LearningOct-30-2023

Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.

artificial intelligence, machine learning, sensitivity, (17 more...)

arXiv.org Machine Learning

2310.19273

Country:

Europe > Netherlands (0.14)
Asia > Japan (0.14)

Genre: Research Report > New Finding (0.88)

Add feedback

Model Merging by Uncertainty-Based Gradient Matching

Daheim, Nico, Möllenhoff, Thomas, Ponti, Edoardo Maria, Gurevych, Iryna, Khan, Mohammad Emtiyaz

arXiv.org Artificial IntelligenceOct-19-2023

Models trained on different datasets can be merged by a weighted-averaging of their parameters, but why does it work and when can it fail? Here, we connect the inaccuracy of weighted-averaging to mismatches in the gradients and propose a new uncertainty-based scheme to improve the performance by reducing the mismatch. The connection also reveals implicit assumptions in other schemes such as averaging, task arithmetic, and Fisher-weighted averaging. Our new method gives consistent improvements for large language models and vision transformers, both in terms of performance and robustness to hyperparameters.

artificial intelligence, natural language, uncertainty-based gradient matching, (1 more...)

arXiv.org Artificial Intelligence

2310.12808

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

The Lie-Group Bayesian Learning Rule

Kıral, Eren Mehmet, Möllenhoff, Thomas, Khan, Mohammad Emtiyaz

arXiv.org Artificial IntelligenceMar-8-2023

The Bayesian Learning Rule provides a framework for generic algorithm design but can be difficult to use for three reasons. First, it requires a specific parameterization of exponential family. Second, it uses gradients which can be difficult to compute. Third, its update may not always stay on the manifold. We address these difficulties by proposing an extension based on Lie-groups where posteriors are parametrized through transformations of an arbitrary base distribution and updated via the group's exponential map. This simplifies all three difficulties for many cases, providing flexible parametrizations through group's action, simple gradient computation through reparameterization, and updates that always stay on the manifold. We use the new learning rule to derive a new algorithm for deep learning with desirable biologically-plausible attributes to learn sparse features. Our work opens a new frontier for the design of new algorithms by exploiting Lie-group structures.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.04397

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.85)

Add feedback

Filters

Collaborating Authors

Möllenhoff, Thomas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Uncertainty-Aware Decoding with Minimum Bayes Risk

Natural Variational Annealing for Multimodal Optimization

How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging

Variational Low-Rank Adaptation Using IVON

Conformal Prediction via Regression-as-Classification

Variational Learning is Effective for Large Deep Networks

SAM as an Optimal Relaxation of Bayes

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data

Model Merging by Uncertainty-Based Gradient Matching

The Lie-Group Bayesian Learning Rule