Goto

Collaborating Authors

 Bayesian Inference


Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs), with their powerful content generation capabilities, have been successfully applied to data annotation processes. However, the VLM-generated labels exhibit dual limitations: low quality (i.e., label noise) and absence of error correction mechanisms. To enhance label quality, we propose Human-Corrected Labels (HCLs), a novel setting that efficient human correction for VLM-generated noisy labels. As shown in Figure 1(b), HCL strategically deploys human correction only for instances with VLM discrepancies, achieving both higher-quality annotations and reduced labor costs. Specifically, we theoretically derive a risk-consistent estimator that incorporates both human-corrected labels and VLM predictions to train classifiers. Besides, we further propose a conditional probability method to estimate the label distribution using a combination of VLM outputs and model predictions. Extensive experiments demonstrate that our approach achieves superior classification performance and is robust to label noise, validating the effectiveness of HCL in practical weak supervision scenarios. Code https://github.com/Lilianach24/HCL.git


Bayesian Evaluation of Large Language Model Behavior

arXiv.org Machine Learning

It is increasingly important to evaluate how text generation systems based on large language models (LLMs) behave, such as their tendency to produce harmful output or their sensitivity to adversarial inputs. Such evaluations often rely on a curated benchmark set of input prompts provided to the LLM, where the output for each prompt may be assessed in a binary fashion (e.g., harmful/non-harmful or does not leak/leaks sensitive information), and the aggregation of binary scores is used to evaluate the LLM. However, existing approaches to evaluation often neglect statistical uncertainty quantification. With an applied statistics audience in mind, we provide background on LLM text generation and evaluation, and then describe a Bayesian approach for quantifying uncertainty in binary evaluation metrics. We focus in particular on uncertainty that is induced by the probabilistic text generation strategies typically deployed in LLM-based systems. We present two case studies applying this approach: 1) evaluating refusal rates on a benchmark of adversarial inputs designed to elicit harmful responses, and 2) evaluating pairwise preferences of one LLM over another on a benchmark of open-ended interactive dialogue examples. We demonstrate how the Bayesian approach can provide useful uncertainty quantification about the behavior of LLM-based systems.



Flow Matching for Scalable Simulation-Based Inference

Neural Information Processing Systems

Figure 1: Comparison of network architectures (left) and flow trajectories (right). Discrete flows (NPE, top) require a specialized architecture for the density estimator. Continuous flows (FMPE, bottom) are based on a vector field parametrized with an unconstrained architecture.


Flow Matching for Scalable Simulation-Based Inference Jonas Wildberger

Neural Information Processing Systems

Figure 1: Comparison of network architectures (left) and flow trajectories (right). Discrete flows (NPE, top) require a specialized architecture for the density estimator. Continuous flows (FMPE, bottom) are based on a vector field parametrized with an unconstrained architecture.


Dynamic Bottleneck for Robust Self-Supervised Exploration

Neural Information Processing Systems

However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.


Provably Safe Stein Variational Clarity-Aware Informative Planning

arXiv.org Artificial Intelligence

Autonomous robots are increasingly deployed for information-gathering tasks in environments that vary across space and time. Planning informative and safe trajectories in such settings is challenging because information decays when regions are not revisited. Most existing planners model information as static or uniformly decaying, ignoring environments where the decay rate varies spatially; those that model non-uniform decay often overlook how it evolves along the robot's motion, and almost all treat safety as a soft penalty. In this paper, we address these challenges. We model uncertainty in the environment using clarity, a normalized representation of differential entropy from our earlier work that captures how information improves through new measurements and decays over time when regions are not revisited. Building on this, we present Stein V ariational Clarity-A ware Informative Planning, a framework that embeds clarity dynamics within trajectory optimization and enforces safety through a low-level filtering mechanism based on our earlier gatekeeper framework for safety verification. The planner performs Bayesian inference-based learning via Stein variational inference, refining a distribution over informative trajectories while filtering each nominal Stein informative trajectory to ensure safety. Hardware experiments and simulations across environments with varying decay rates and obstacles demonstrate consistent safety and reduced information deficits.


BATIS: Bayesian Approaches for Targeted Improvement of Species Distribution Models

arXiv.org Artificial Intelligence

Species distribution models (SDMs), which aim to predict species occurrence based on environmental variables, are widely used to monitor and respond to biodiversity change. Recent deep learning advances for SDMs have been shown to perform well on complex and heterogeneous datasets, but their effectiveness remains limited by spatial biases in the data. In this paper, we revisit deep SDMs from a Bayesian perspective and introduce BATIS, a novel and practical framework wherein prior predictions are updated iteratively using limited observational data. Models must appropriately capture both aleatoric and epistemic uncertainty to effectively combine fine-grained local insights with broader ecological patterns. We benchmark an extensive set of uncertainty quantification approaches on a novel dataset including citizen science observations from the eBird platform. Our empirical study shows how Bayesian deep learning approaches can greatly improve the reliability of SDMs in data-scarce locations, which can contribute to ecological understanding and conservation efforts.


Theory and computation for structured variational inference

arXiv.org Machine Learning

Structured variational inference constitutes a core methodology in modern statistical applications. Unlike mean-field variational inference, the approximate posterior is assumed to have interdependent structure. We consider the natural setting of star-structured variational inference, where a root variable impacts all the other ones. We prove the first results for existence, uniqueness, and self-consistency of the variational approximation. In turn, we derive quantitative approximation error bounds for the variational approximation to the posterior, extending prior work from the mean-field setting to the star-structured setting. We also develop a gradient-based algorithm with provable guarantees for computing the variational approximation using ideas from optimal transport theory. We explore the implications of our results for Gaussian measures and hierarchical Bayesian models, including generalized linear models with location family priors and spike-and-slab priors with one-dimensional debiasing. As a by-product of our analysis, we develop new stability results for star-separable transport maps which might be of independent interest.


Exploiting individual differences to bootstrap communication

arXiv.org Artificial Intelligence

Establishing a communication system is hard because the intended meaning of a signal is unknown to its receiver when first produced, and the signaller also has no idea how that signal will be interpreted. Most theoretical accounts of the emergence of communication systems rely on feedback to reinforce behaviours that have led to successful communication in the past. However, providing such feedback requires already being able to communicate the meaning that was intended or interpreted. Therefore these accounts cannot explain how communication can be bootstrapped from non-communicative behaviours. Here we present a model that shows how a communication system, capable of expressing an unbounded number of meanings, can emerge as a result of individual behavioural differences in a large population without any pre-existing means to determine communicative success. The two key cognitive capabilities responsible for this outcome are behaving predictably in a given situation, and an alignment of psychological states ahead of signal production that derives from shared intentionality. Since both capabilities can exist independently of communication, our results are compatible with theories in which large flexible socially-learned communication systems like language are the product of a general but well-developed capacity for social cognition.