Goto

Collaborating Authors

 Bayesian Learning


Latent Variable Modeling for Robust Causal Effect Estimation

arXiv.org Artificial Intelligence

Latent variable models provide a powerful framework for incorporating and inferring unobserved factors in observational data. In causal inference, they help account for hidden factors influencing treatment or outcome, thereby addressing challenges posed by missing or unmeasured covariates. This paper proposes a new framework that integrates latent variable modeling into the double machine learning (DML) paradigm to enable robust causal effect estimation in the presence of such hidden factors. We consider two scenarios: one where a latent variable affects only the outcome, and another where it may influence both treatment and outcome. To ensure tractability, we incorporate latent variables only in the second stage of DML, separating representation learning from latent inference. We demonstrate the robustness and effectiveness of our method through extensive experiments on both synthetic and real-world datasets.


A Unified Theory of Language

arXiv.org Artificial Intelligence

A unified theory of language combines a Bayesian cognitive linguistic model of language processing, with the proposal that language evolved by sexual selection for the display of intelligence. The theory accounts for the major facts of language, including its speed and expressivity, and data on language diversity, pragmatics, syntax and semantics. The computational element of the theory is based on Construction Grammars. These give an account of the syntax and semantics of the worlds languages, using constructions and unification. Two novel elements are added to construction grammars: an account of language pragmatics, and an account of fast, precise language learning. Constructions are represented in the mind as graph like feature structures. People use slow general inference to understand the first few examples they hear of any construction. After that it is learned as a feature structure, and is rapidly applied by unification. All aspects of language (phonology, syntax, semantics, and pragmatics) are seamlessly computed by fast unification; there is no boundary between semantics and pragmatics. This accounts for the major puzzles of pragmatics, and for detailed pragmatic phenomena. Unification is Bayesian maximum likelihood pattern matching. This gives evolutionary continuity between language processing in the human brain, and Bayesian cognition in animal brains. Language is the basis of our mind reading abilities, our cooperation, self esteem and emotions; the foundations of human culture and society.


Unfolding AlphaFold's Bayesian Roots in Probability Kinematics

arXiv.org Artificial Intelligence

We present a novel theoretical interpretation of AlphaFold1 that reveals the potential of generalized Bayesian updating for probabilistic deep learning. The seminal breakthrough of AlphaFold1 in protein structure prediction by deep learning relied on a learned potential energy function, in contrast to the later end-to-end architectures of AlphaFold2 and AlphaFold3. While this potential was originally justified by referring to physical potentials of mean force (PMFs), we reinterpret AlphaFold1's potential as an instance of {\em probability kinematics} -- also known as {\em Jeffrey conditioning} -- a principled but under-recognised generalization of conventional Bayesian updating. Probability kinematics accommodates uncertain or {\em soft} evidence in the form of updated probabilities over a partition. This perspective reveals AlphaFold1's potential as a form of generalized Bayesian updating, rather than a thermodynamic potential. To confirm our probabilistic framework's scope and precision, we analyze a synthetic 2D model in which an angular random walk prior is updated with evidence on distances via probability kinematics, mirroring AlphaFold1's approach. This theoretical contribution connects AlphaFold1 to a broader class of well-justified Bayesian methods, allowing precise quantification, surpassing merely qualitative heuristics based on PMFs. Our contribution is theoretical: we replace AlphaFold1's heuristic analogy with a principled probabilistic framework, tested in a controlled synthetic setting where correctness can be assessed. More broadly, our results point to the considerable promise of probability kinematics for probabilistic deep learning, by allowing the formulation of complex models from a few simpler components.


Active Query Selection for Crowd-Based Reinforcement Learning

arXiv.org Artificial Intelligence

Preference-based reinforcement learning has gained prominence as a strategy for training agents in environments where the reward signal is difficult to specify or misaligned with human intent. However, its effectiveness is often limited by the high cost and low availability of reliable human input, especially in domains where expert feedback is scarce or errors are costly. To address this, we propose a novel framework that combines two complementary strategies: probabilistic crowd modelling to handle noisy, multi-annotator feedback, and active learning to prioritize feedback on the most informative agent actions. We extend the Advise algorithm to support multiple trainers, estimate their reliability online, and incorporate entropy-based query selection to guide feedback requests. We evaluate our approach in a set of environments that span both synthetic and real-world-inspired settings, including 2D games (Taxi, Pacman, Frozen Lake) and a blood glucose control task for Type 1 Diabetes using the clinically approved UVA/Padova simulator. Our preliminary results demonstrate that agents trained with feedback on uncertain trajectories exhibit faster learning in most tasks, and we outperform the baselines for the blood glucose control task.


CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks

arXiv.org Artificial Intelligence

Minecraft, as an open-world virtual interactive environment, has become a prominent platform for research on agent decision-making and execution. Existing works primarily adopt a single Large Language Model (LLM) agent to complete various in-game tasks. However, for complex tasks requiring lengthy sequences of actions, single-agent approaches often face challenges related to inefficiency and limited fault tolerance. Despite these issues, research on multi-agent collaboration remains scarce. In this paper, we propose CausalMACE, a holistic causality planning framework designed to enhance multi-agent systems, in which we incorporate causality to manage dependencies among subtasks. Technically, our proposed framework introduces two modules: an overarching task graph for global task planning and a causality-based module for dependency management, where inherent rules are adopted to perform causal intervention. Experimental results demonstrate our approach achieves state-of-the-art performance in multi-agent cooperative tasks of Minecraft.


A Novel Framework for Uncertainty Quantification via Proper Scores for Classification and Beyond

arXiv.org Machine Learning

In this PhD thesis, we propose a novel framework for uncertainty quantification in machine learning, which is based on proper scores. Uncertainty quantification is an important cornerstone for trustworthy and reliable machine learning applications in practice. Usually, approaches to uncertainty quantification are problem-specific, and solutions and insights cannot be readily transferred from one task to another. Proper scores are loss functions minimized by predicting the target distribution. Due to their very general definition, proper scores apply to regression, classification, or even generative modeling tasks. We contribute several theoretical results, that connect epistemic uncertainty, aleatoric uncertainty, and model calibration with proper scores, resulting in a general and widely applicable framework. We achieve this by introducing a general bias-variance decomposition for strictly proper scores via functional Bregman divergences. Specifically, we use the kernel score, a kernel-based proper score, for evaluating sample-based generative models in various domains, like image, audio, and natural language generation. This includes a novel approach for uncertainty estimation of large language models, which outperforms state-of-the-art baselines. Further, we generalize the calibration-sharpness decomposition beyond classification, which motivates the definition of proper calibration errors. We then introduce a novel estimator for proper calibration errors in classification, and a novel risk-based approach to compare different estimators for squared calibration errors. Last, we offer a decomposition of the kernel spherical score, another kernel-based proper score, allowing a more fine-grained and interpretable evaluation of generative image models.


Multidimensional Distributional Neural Network Output Demonstrated in Super-Resolution of Surface Wind Speed

arXiv.org Machine Learning

Accurate quantification of uncertainty in neural network predictions remains a central challenge for scientific applications involving high-dimensional, correlated data. While existing methods capture either aleatoric or epistemic uncertainty, few offer closed-form, multidimensional distributions that preserve spatial correlation while remaining computationally tractable. In this work, we present a framework for training neural networks with a multidimensional Gaussian loss, generating closed-form predictive distributions over outputs with non-identically distributed and heteroscedastic structure. Our approach captures aleatoric uncertainty by iteratively estimating the means and covariance matrices, and is demonstrated on a super-resolution example. We leverage a Fourier representation of the covariance matrix to stabilize network training and preserve spatial correlation. We introduce a novel regularization strategy -- referred to as information sharing -- that interpolates between image-specific and global covariance estimates, enabling convergence of the super-resolution downscaling network trained on image-specific distributional loss functions. This framework allows for efficient sampling, explicit correlation modeling, and extensions to more complex distribution families all without disrupting prediction performance. We demonstrate the method on a surface wind speed downscaling task and discuss its broader applicability to uncertainty-aware prediction in scientific models.


Algebraic Approach to Ridge-Regularized Mean Squared Error Minimization in Minimal ReLU Neural Network

arXiv.org Machine Learning

This paper investigates a perceptron, a simple neural network model, with ReLU activation and a ridge-regularized mean squared error (RR-MSE). Our approach leverages the fact that the RR-MSE for ReLU perceptron is piecewise polynomial, enabling a systematic analysis using tools from computational algebra. In particular, we develop a Divide-Enumerate-Merge strategy that exhaustively enumerates all local minima of the RR-MSE. By virtue of the algebraic formulation, our approach can identify not only the typical zero-dimensional minima (i.e., isolated points) obtained by numerical optimization, but also higher-dimensional minima (i.e., connected sets such as curves, surfaces, or hypersurfaces). Although computational algebraic methods are computationally very intensive for perceptrons of practical size, as a proof of concept, we apply the proposed approach in practice to minimal perceptrons with a few hidden units.


Who Wins the Race? (R Vs Python) - An Exploratory Study on Energy Consumption of Machine Learning Algorithms

arXiv.org Artificial Intelligence

The utilization of Machine Learning (ML) in contemporary software systems is extensive and continually expanding. However, its usage is energy-intensive, contributing to increased carbon emissions and demanding significant resources. While numerous studies examine the performance and accuracy of ML, only a limited few focus on its environmental aspects, particularly energy consumption. In addition, despite emerging efforts to compare energy consumption across various programming languages for specific algorithms and tasks, there remains a gap specifically in comparing these languages for ML-based tasks. This paper aims to raise awareness of the energy costs associated with employing different programming languages for ML model training and inference. Through this empirical study, we measure and compare the energy consumption along with run-time performance of five regression and five classification tasks implemented in Python and R, the two most popular programming languages in this context. Our study results reveal a statistically significant difference in costs between the two languages in 95% of the cases examined. Furthermore, our analysis demonstrates that the choice of programming language can influence energy efficiency significantly, up to 99.16% during model training and up to 99.8% during inferences, for a given ML task.


Introduction to Regularization and Learning Methods for Inverse Problems

arXiv.org Artificial Intelligence

These lecture notes evolve around mathematical concepts arising in inverse problems. We start by introducing inverse problems through examples such as differentiation, deconvolution, computed tomography and phase retrieval. This then leads us to the framework of well-posedness and first considerations regarding reconstruction and inversion approaches. The second chapter then first deals with classical regularization theory of inverse problems in Hilbert spaces. After introducing the pseudo-inverse, we review the concept of convergent regularization. Within this chapter we then proceed to ask the question of how to realize practical reconstruction algorithms. Here, we mainly focus on Tikhonov and sparsity promoting regularization in finite dimensional spaces. In the third chapter, we dive into modern deep-learning methods, which allow solving inverse problems in a data-dependent approach. The intersection between inverse problems and machine learning is a rapidly growing field and our exposition here restricts itself to a very limited selection of topics. Among them are learned regularization, fully-learned Bayesian estimation, post-processing strategies and plug-n-play methods.