Goto

Collaborating Authors

 minimiser


Soft Specialists: $α$-Rényi Ensembles for Uncertainty-Aware LLM Post-Training

arXiv.org Machine Learning

Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent uncertainties into a single, averaged pattern of behaviour. We propose an $α$-Rényi variational framework for learning distributions over post-training parameters, offering an uncertainty-aware alternative to deep ensemble approaches. The resulting variational objective interpolates between classical variational Bayes and predictively oriented posterior learning, balancing between globally plausible individual models against systems of complementary specialists. We identify local stability criteria, demonstrating how model misspecification can make non-degenerate posterior spread locally favourable, manifesting contradictory or conflicting data as epistemic uncertainty. We apply our framework to LLM post-training, learning an ensemble of LoRA adapters attached to a shared, frozen base model, providing a scalable training procedure for both supervised fine-tuning and preference optimisation. Our approach enables training examples to be softly routed across ensemble members, promoting model specialisation and providing actionable uncertainty estimates across different tasks.




Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation

Neural Information Processing Systems

Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.


Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings

arXiv.org Artificial Intelligence

Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.


Minimisation of Submodular Functions Using Gaussian Zeroth-Order Random Oracles

arXiv.org Artificial Intelligence

We consider the minimisation problem of submodular functions and investigate the application of a zeroth-order method to this problem. The method is based on exploiting a Gaussian smoothing random oracle to estimate the smoothed function gradient. We prove the convergence of the algorithm to a global $ε$-approximate solution in the offline case and show that the algorithm is Hannan-consistent in the online case with respect to static regret. Moreover, we show that the algorithm achieves $O(\sqrt{NP_N^\ast})$ dynamic regret, where $N$ is the number of iterations and $P_N^\ast$ is the path length. The complexity analysis and hyperparameter selection are presented for all the cases. The theoretical results are illustrated via numerical examples.



Rates of Convergence of Generalised Variational Inference Posteriors under Prior Misspecification

arXiv.org Machine Learning

We prove rates of convergence and robustness to prior misspecification within a Generalised Variational Inference (GVI) framework with bounded divergences. This addresses a significant open challenge for GVI and Federated GVI that employ a different divergence to the Kullback--Leibler under prior misspecification, operate within a subset of possible probability measures, and result in intractable posteriors. Our theoretical contributions cover severe prior misspecification while relying on our ability to restrict the space of possible GVI posterior measures, and infer properties based on this space. In particular, we are able to establish sufficient conditions for existence and uniqueness of GVI posteriors on arbitrary Polish spaces, prove that the GVI posterior measure concentrates on a neighbourhood of loss minimisers, and extend this to rates of convergence regardless of the prior measure.



Mean-Field Generalisation Bounds for Learning Controls in Stochastic Environments

arXiv.org Machine Learning

When solving stochastic control problems, one is often limited by the challenge of specifying realistic model dynamics of the involved processes. Parametric approaches to estimating dynamics introduce model error, while'model-free' approaches typically suffer from extreme curse of dimensionality constraints. The development of reliable machine-learning based methods for stochastic control is therefore of significant practical interest. In this paper, we focus on problems where a decision maker faces a stochastic environment, that is, where they interact with a system with unknown and uncontrolled stochastic dynamics, which, together with their control, induce a controlled state process and costs. Examples of this include optimal investment for a small investor - here the stochastic dynamics of assets are uncontrolled and unknown, the investor chooses a strategy based on past observations, and together these generate a wealth process which must be optimised. A second example is aerial navigation in the presence of uncertain weather - the weather is unaffected by the navigation policy chosen, while the navigator must account for uncertainties in their planning, and the resulting flight-plan needs to be optimised. In both these cases, the stochastic environment is naturally high-dimensional and may not be Markovian, and so is challenging to model statistically using finitely many observations. We consider the setting where we have access to a finite number of i.i.d.