Goto

Collaborating Authors

 Directed Networks


Bayesian Predictive Coding

arXiv.org Machine Learning

Predictive coding (PC) is an influential theory of information processing in the brain, providing a biologically plausible alternative to backpropagation. It is motivated in terms of Bayesian inference, as hidden states and parameters are optimised via gradient descent on variational free energy. However, implementations of PC rely on maximum \textit{a posteriori} (MAP) estimates of hidden states and maximum likelihood (ML) estimates of parameters, limiting their ability to quantify epistemic uncertainty. In this work, we investigate a Bayesian extension to PC that estimates a posterior distribution over network parameters. This approach, termed Bayesian Predictive coding (BPC), preserves the locality of PC and results in closed-form Hebbian weight updates. Compared to PC, our BPC algorithm converges in fewer epochs in the full-batch setting and remains competitive in the mini-batch setting. Additionally, we demonstrate that BPC offers uncertainty quantification comparable to existing methods in Bayesian deep learning, while also improving convergence properties. Together, these results suggest that BPC provides a biologically plausible method for Bayesian learning in the brain, as well as an attractive approach to uncertainty quantification in deep learning.


Neural Bayes inference for complex bivariate extremal dependence models

arXiv.org Machine Learning

Likelihood-free approaches are appealing for performing inference on complex dependence models, either because it is not possible to formulate a likelihood function, or its evaluation is very computationally costly. This is the case for several models available in the multivariate extremes literature, particularly for the most flexible tail models, including those that interpolate between the two key dependence classes of `asymptotic dependence' and `asymptotic independence'. We focus on approaches that leverage neural networks to approximate Bayes estimators. In particular, we explore the properties of neural Bayes estimators for parameter inference for several flexible but computationally expensive models to fit, with a view to aiding their routine implementation. Owing to the absence of likelihood evaluation in the inference procedure, classical information criteria such as the Bayesian information criterion cannot be used to select the most appropriate model. Instead, we propose using neural networks as neural Bayes classifiers for model selection. Our goal is to provide a toolbox for simple, fast fitting and comparison of complex extreme-value dependence models, where the best model is selected for a given data set and its parameters subsequently estimated using neural Bayes estimation. We apply our classifiers and estimators to analyse the pairwise extremal behaviour of changes in horizontal geomagnetic field fluctuations at three different locations.


a1e6783e4d739196cad3336f12d402bf-Paper-Conference.pdf

Neural Information Processing Systems

Imitation learning uses data for training policies to solve complex tasks. However, when the training data is collected from human demonstrators, it often leads to multimodal distributions because of the variability in human actions. Most imitation learning methods rely on a maximum likelihood (ML) objective to learn a parameterized policy, but this can result in suboptimal or unsafe behavior due to the mode-averaging property of the ML objective. In this work, we propose Information Maximizing Curriculum, a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent, effectively mitigating the mode-averaging problem by allowing the model to ignore data from modes it cannot represent. To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning. A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution. We demonstrate the effectiveness of our approach on complex simulated control tasks using diverse human demonstrations, achieving superior performance compared to state-of-the-art methods.


Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer Miao Lu

Neural Information Processing Systems

Aligning generative models with human preference via RLHF typically suffers from overoptimization, where an imperfectly learned reward model can misguide the generative model to output undesired responses. We investigate this problem in a principled manner by identifying the source of the misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model; one that simultaneously minimizes the maximum likelihood estimation of the loss and a reward penalty term. The penalty term is introduced to prevent the policy from choosing actions with spurious high proxy rewards, resulting in provable sample efficiency of the algorithm under a partial coverage style condition. Moving from theory to practice, the proposed algorithm further enjoys an equivalent but surprisingly easy-to-implement reformulation. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines: (i) a preference optimization loss that directly aligns the policy with human preference, and (ii) a supervised learning loss that explicitly imitates the policy with a (suitable) baseline distribution. In the context of aligning large language models (LLM), this objective fuses the direct preference optimization (DPO) loss with the supervised fine-tuning (SFT) loss to help mitigate the overoptimization towards undesired responses, for which we name the algorithm Regularized Preference Optimization (RPO). Experiments of aligning LLMs demonstrate the improved performance of RPO compared with DPO baselines.



Trust Your: Gradient-based Intervention Targeting for Causal Discovery

Neural Information Processing Systems

Inferring causal structure from data is a challenging task of fundamental importance in science. Often, observational data alone is not enough to uniquely identify a system's causal structure. The use of interventional data can address this issue, however, acquiring these samples typically demands a considerable investment of time and physical or financial resources. In this work, we are concerned with the acquisition of interventional data in a targeted manner to minimize the number of required experiments. We propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention targeting function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.



Semantic Probabilistic Layers for Neuro-Symbolic Learning

Neural Information Processing Systems

We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint.



Noether's Razor: Learning Conserved Quantities

Neural Information Processing Systems

Symmetries have proven useful in machine learning models, improving generalisation and overall performance. At the same time, recent advancements in learning dynamical systems rely on modelling the underlying Hamiltonian to guarantee the conservation of energy. These approaches can be connected via a seminal result in mathematical physics: Noether's theorem, which states that symmetries in a dynamical system correspond to conserved quantities. This work uses Noether's theorem to parameterise symmetries as learnable conserved quantities. We then allow conserved quantities and associated symmetries to be learned directly from train data through approximate Bayesian model selection, jointly with the regular training procedure. As training objective, we derive a variational lower bound to the marginal likelihood. The objective automatically embodies an Occam's Razor effect that avoids collapse of conservation laws to the trivial constant, without the need to manually add and tune additional regularisers. We demonstrate a proof-ofprinciple on n-harmonic oscillators and n-body systems. We find that our method correctly identifies the correct conserved quantities and U(n) and SE(n) symmetry groups, improving overall performance and predictive accuracy on test data.