Goto

Collaborating Authors

 Bayesian Inference


Bayesian Inference and Learning in Nonlinear Dynamical Systems: A Framework for Incorporating Explicit and Implicit Prior Knowledge

arXiv.org Machine Learning

Accuracy and generalization capabilities are key objectives when learning dynamical system models. To obtain such models from limited data, current works exploit prior knowledge and assumptions about the system. However, the fusion of diverse prior knowledge, e. g. partially known system equations and smoothness assumptions about unknown model parts, with information contained in the data remains a challenging problem, especially in input-output settings with latent system state. In particular, learning functions that are nested inside known system equations can be a laborious and error-prone expert task. This paper considers inference of latent states and learning of unknown model parts for fusion of data information with different sources of prior knowledge. The main contribution is a general-purpose system identification tool that, for the first time, provides a consistent solution for both, online and offline Bayesian inference and learning while allowing to incorporate explicit and implicit prior system knowledge. We propose a novel interface for combining known dynamics functions with a learning-based approximation of unknown system parts. Based on the proposed model structure, closed-form densities for efficient parameter marginalization are derived. No user-tailored coordinate transformations or model inversions are needed, making the presented framework a general-purpose tool for inference and learning. The broad applicability of the devised framework is illustrated in three distinct case studies, including an experimental data set.


On Prior Distributions for Orthogonal Function Sequences

arXiv.org Machine Learning

We propose a novel class of prior distributions for sequences of orthogonal functions, which are frequently required in various statistical models such as functional principal component analysis (FPCA). Our approach constructs priors sequentially by imposing adaptive orthogonality constraints through a hierarchical formulation of conditionally normal distributions. The orthogonality is controlled via hyperparameters, allowing for flexible trade-offs between exactness and smoothness, which can be learned from the observed data. We illustrate the properties of the proposed prior and show that it leads to nearly orthogonal posterior estimates. The proposed prior is employed in Bayesian FPCA, providing more interpretable principal functions and efficient low-rank representations. Through simulation studies and analysis of human mobility data in Tokyo, we demonstrate the superior performance of our approach in inducing orthogonality and improving functional component estimation.


Wormhole Dynamics in Deep Neural Networks

arXiv.org Artificial Intelligence

This work investigates the generalization behavior of deep neural networks (DNNs), focusing on the phenomenon of "fooling examples," where DNNs confidently classify inputs that appear random or unstructured to humans. To explore this phenomenon, we introduce an analytical framework based on maximum likelihood estimation, without adhering to conventional numerical approaches that rely on gradient-based optimization and explicit labels. Our analysis reveals that DNNs operating in an overparameterized regime exhibit a collapse in the output feature space. While this collapse improves network generalization, adding more layers eventually leads to a state of degeneracy, where the model learns trivial solutions by mapping distinct inputs to the same output, resulting in zero loss. Further investigation demonstrates that this degeneracy can be bypassed using our newly derived "wormhole" solution. The wormhole solution, when applied to arbitrary fooling examples, reconciles meaningful labels with random ones and provides a novel perspective on shortcut learning. These findings offer deeper insights into DNN generalization and highlight directions for future research on learning dynamics in unsupervised settings to bridge the gap between theory and practice.


A "good regulator theorem" for embodied agents

arXiv.org Artificial Intelligence

In a classic paper, Conant and Ashby claimed that "every good regulator of a system must be a model of that system." Artificial Life has produced many examples of systems that perform tasks with apparently no model in sight; these suggest Conant and Ashby's theorem doesn't easily generalise beyond its restricted setup. Nevertheless, here we show that a similar intuition can be fleshed out in a different way: whenever an agent is able to perform a regulation task, it is possible for an observer to interpret it as having "beliefs" about its environment, which it "updates" in response to sensory input. This notion of belief updating provides a notion of model that is more sophisticated than Conant and Ashby's, as well as a theorem that is more broadly applicable. However, it necessitates a change in perspective, in that the observer plays an essential role in the theory: models are not a mere property of the system but are imposed on it from outside. Our theorem holds regardless of whether the system is regulating its environment in a classic control theory setup, or whether it's regulating its own internal state; the model is of its environment either way. The model might be trivial, however, and this is how the apparent counterexamples are resolved.


Tutorial on the Probabilistic Unification of Estimation Theory, Machine Learning, and Generative AI

arXiv.org Artificial Intelligence

Extracting meaning from uncertain, noisy data is a fundamental problem across time series analysis, pattern recognition, and language modeling. This survey presents a unified mathematical framework that connects classical estimation theory, statistical inference, and modern machine learning, including deep learning and large language models. By analyzing how techniques such as maximum likelihood estimation, Bayesian inference, and attention mechanisms address uncertainty, the paper illustrates that many AI methods are rooted in shared probabilistic principles. Through illustrative scenarios including system identification, image classification, and language generation, we show how increasingly complex models build upon these foundations to tackle practical challenges like overfitting, data sparsity, and interpretability. In other words, the work demonstrates that maximum likelihood, MAP estimation, Bayesian classification, and deep learning all represent different facets of a shared goal: inferring hidden causes from noisy and/or biased observations. It serves as both a theoretical synthesis and a practical guide for students and researchers navigating the evolving landscape of machine learning.





Towards Unified Probabilistic Verification and Validation of Vision-Based Autonomy

arXiv.org Artificial Intelligence

Precise and comprehensive situational awareness is a critical capability of modern autonomous systems. Deep neural networks that perceive task-critical details from rich sensory signals have become ubiquitous; however, their black-box behavior and sensitivity to environmental uncertainty and distribution shifts make them challenging to verify formally. Abstraction-based verification techniques for vision-based autonomy produce safety guarantees contingent on rigid assumptions, such as bounded errors or known unique distributions. Such overly restrictive and inflexible assumptions limit the validity of the guarantees, especially in diverse and uncertain test-time environments. We propose a methodology that unifies the verification models of perception with their offline validation. Our methodology leverages interval MDPs and provides a flexible end-to-end guarantee that adapts directly to the out-of-distribution test-time conditions. We evaluate our methodology on a synthetic perception Markov chain with well-defined state estimation distributions and a mountain car benchmark. Our findings reveal that we can guarantee tight yet rigorous bounds on overall system safety.


Measuring IIA Violations in Similarity Choices with Bayesian Models

arXiv.org Machine Learning

Similarity choice data occur when humans make choices among alternatives based on their similarity to a target, e.g., in the context of information retrieval and in embedding learning settings. Classical metric-based models of similarity choice assume independence of irrelevant alternatives (IIA), a property that allows for a simpler formulation. While IIA violations have been detected in many discrete choice settings, the similarity choice setting has received scant attention. This is because the target-dependent nature of the choice complicates IIA testing. We propose two statistical methods to test for IIA: a classical goodness-of-fit test and a Bayesian counterpart based on the framework of Posterior Predictive Checks (PPC). This Bayesian approach, our main technical contribution, quantifies the degree of IIA violation beyond its mere significance. We curate two datasets: one with choice sets designed to elicit IIA violations, and another with randomly generated choice sets from the same item universe. Our tests confirmed significant IIA violations on both datasets, and notably, we find a comparable degree of violation between them. Further, we devise a new PPC test for population homogeneity. Results show that the population is indeed homogenous, suggesting that the IIA violations are driven by context effects -- specifically, interactions within the choice sets. These results highlight the need for new similarity choice models that account for such context effects.