Goto

Collaborating Authors

 Bayesian Learning


Building causation links in stochastic nonlinear systems from data

arXiv.org Artificial Intelligence

Causal relationships play a fundamental role in understanding the world around us. The ability to identify and understand cause-effect relationships is critical to making informed decisions, predicting outcomes, and developing effective strategies. However, deciphering causal relationships from observational data is a difficult task, as correlations alone may not provide definitive evidence of causality. In recent years, the field of machine learning (ML) has emerged as a powerful tool, offering new opportunities for uncovering hidden causal mechanisms and better understanding complex systems. In this work, we address the issue of detecting the intrinsic causal links of a large class of complex systems in the framework of the response theory in physics. We develop some theoretical ideas put forward by [1], and technically we use state-of-the-art ML techniques to build up models from data. We consider both linear stochastic and non-linear systems. Finally, we compute the asymptotic efficiency of the linear response based causal predictor in a case of large scale Markov process network of linear interactions.


Temporal Counterfactual Explanations of Behaviour Tree Decisions

arXiv.org Artificial Intelligence

Explainability is a critical tool in helping stakeholders understand robots. In particular, the ability for robots to explain why they have made a particular decision or behaved in a certain way is useful in this regard. Behaviour trees are a popular framework for controlling the decision-making of robots and other software systems, and thus a natural question to ask is whether or not a system driven by a behaviour tree is capable of answering "why" questions. While explainability for behaviour trees has seen some prior attention, no existing methods are capable of generating causal, counterfactual explanations which detail the reasons for robot decisions and behaviour. Therefore, in this work, we introduce a novel approach which automatically generates counterfactual explanations in response to contrastive "why" questions. Our method achieves this by first automatically building a causal model from the structure of the behaviour tree as well as domain knowledge about the state and individual behaviour tree nodes. The resultant causal model is then queried and searched to find a set of diverse counterfactual explanations. We demonstrate that our approach is able to correctly explain the behaviour of a wide range of behaviour tree structures and states. By being able to answer a wide range of causal queries, our approach represents a step towards more transparent, understandable and ultimately trustworthy robotic systems.


Basis Vector Metric: A Method for Robust Open-Ended State Change Detection

arXiv.org Artificial Intelligence

We test a new method, which we will abbreviate using the acronym BVM (Basis Vectors Method), in its ability to judge the state changes in images through using language embeddings. We used the MIT-States dataset, containing about 53,000 images, to gather all of our data, which has 225 nouns and 115 adjectives, with each noun having about 9 different adjectives, forming approximately 1000 noun-adjective pairs. For our first experiment, we test our method's ability to determine the state of each noun class separately against other metrics for comparison. These metrics are cosine similarity, dot product, product quantization, binary index, Naive Bayes, and a custom neural network. Among these metrics, we found that our proposed BVM performs the best in classifying the states for each noun. We then perform a second experiment where we try using BVM to determine if it can differentiate adjectives from one another for each adjective separately. We compared the abilities of BVM to differentiate adjectives against the proposed method the MIT-States paper suggests: using a logistic regression model. In the end, we did not find conclusive evidence that our BVM metric could perform better than the logistic regression model at discerning adjectives. Yet, we were able to find evidence for possible improvements to our method; this leads to the chance of increasing our method's accuracy through certain changes in our methodologies.


TGLF-SINN: Deep Learning Surrogate Model for Accelerating Turbulent Transport Modeling in Fusion

arXiv.org Artificial Intelligence

The Trapped Gyro-Landau Fluid (TGLF) model provides fast, accurate predictions of turbulent transport in tokamaks, but whole device simulations requiring thousands of evaluations remain computationally expensive. Neural network (NN) surrogates offer accelerated inference with fully differentiable approximations that enable gradient-based coupling but typically require large training datasets to capture transport flux variations across plasma conditions, creating significant training burden and limiting applicability to expensive gyrokinetic simulations. We propose \textbf{TGLF-SINN (Spectra-Informed Neural Network)} with three key innovations: (1) principled feature engineering that reduces target prediction range, simplifying the learning task; (2) physics-guided regularization of transport spectra to improve generalization under sparse data; and (3) Bayesian Active Learning (BAL) to strategically select training samples based on model uncertainty, reducing data requirements while maintaining accuracy. Our approach achieves superior performance with significantly less training data. In offline settings, TGLF-SINN reduces logarithmic root mean squared error (LRMSE) by 12. 4\% compared to the current baseline \base. Using only 25\% of the complete dataset with BAL, we achieve LRMSE only 0.0165 higher than \base~and 0.0248 higher than our offline model (0.0583). In downstream flux matching applications, our NN surrogate provides 45x speedup over TGLF while maintaining comparable accuracy, demonstrating potential for training efficient surrogates for higher-fidelity models where data acquisition is costly and sparse.


Machine Generalize Learning in Agent-Based Models: Going Beyond Surrogate Models for Calibration in ABMs

arXiv.org Artificial Intelligence

Calibrating agent-based epidemic models is computationally demanding. We present a supervised machine learning calibrator that learns the inverse mapping from epidemic time series to SIR parameters. A three-layer bidirectional LSTM ingests 60-day incidence together with population size and recovery rate, and outputs transmission probability, contact rate, and R0. Training uses a composite loss with an epidemiology-motivated consistency penalty that encourages R0 \* recovery rate to equal transmission probability \* contact rate. In a 1000-scenario simulation study, we compare the calibrator with Approximate Bayesian Computation (likelihood-free MCMC). The method achieves lower error across all targets (MAE: R0 0.0616 vs 0.275; transmission 0.0715 vs 0.128; contact 1.02 vs 4.24), produces tighter predictive intervals with near nominal coverage, and reduces wall clock time from 77.4 s to 2.35 s per calibration. Although contact rate and transmission probability are partially nonidentifiable, the approach reproduces epidemic curves more faithfully than ABC, enabling fast and practical calibration. We evaluate it on SIR agent based epidemics generated with epiworldR and provide an implementation in R.


Robust variational neural posterior estimation for simulation-based inference

arXiv.org Machine Learning

Recent advances in neural density estimation have enabled powerful simulation-based inference (SBI) methods that can flexibly approximate Bayesian inference for intractable stochastic models. Although these methods have demonstrated reliable posterior estimation when the simulator accurately represents the underlying data generative process (GDP), recent work has shown that they perform poorly in the presence of model misspecification. This poses a significant problem for their use on real-world problems, due to simulators always misrepresenting the true DGP to a certain degree. In this paper, we introduce robust variational neural posterior estimation (R VNP), a method which addresses the problem of misspecification in amortised SBI by bridging the simulation-to-reality gap using variational inference and error modelling. We test R VNP on multiple benchmark tasks, including using real data from astronomy, and show that it can recover robust posterior inference in a data-driven manner without adopting tunable hyperparameters or priors governing the misspecification.


Cryo-EM as a Stochastic Inverse Problem

arXiv.org Machine Learning

Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability. In this work, we formulate cryo-EM reconstruction as a stochastic inverse problem (SIP) over probability measures, where the observed images are modeled as the push-forward of an unknown distribution over molecular structures via a random forward operator. We pose the reconstruction problem as the minimization of a variational discrepancy between observed and simulated image distributions, using statistical distances such as the KL divergence and the Maximum Mean Discrepancy. The resulting optimization is performed over the space of probability measures via a Wasserstein gradient flow, which we numerically solve using particles to represent and evolve conformational ensembles. We validate our approach using synthetic examples, including a realistic protein model, which demonstrates its ability to recover continuous distributions over structural states. We analyze the connection between our formulation and Maximum A Posteriori (MAP) approaches, which can be interpreted as instances of the discretize-then-optimize (DTO) framework. We further provide a consistency analysis, establishing conditions under which DTO methods, such as MAP estimation, converge to the solution of the underlying infinite-dimensional continuous problem. Beyond cryo-EM, the framework provides a general methodology for solving SIPs involving random forward operators.


Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations

arXiv.org Machine Learning

In-context operator networks (ICON) are a class of operator learning methods based on the novel architectures of foundation models. Trained on a diverse set of datasets of initial and boundary conditions paired with corresponding solutions to ordinary and partial differential equations (ODEs and PDEs), ICON learns to map example condition-solution pairs of a given differential equation to an approximation of its solution operator. Here, we present a probabilistic framework that reveals ICON as implicitly performing Bayesian inference, where it computes the mean of the posterior predictive distribution over solution operators conditioned on the provided context, i.e., example condition-solution pairs. The formalism of random differential equations provides the probabilistic framework for describing the tasks ICON accomplishes while also providing a basis for understanding other multi-operator learning methods. This probabilistic perspective provides a basis for extending ICON to \emph{generative} settings, where one can sample from the posterior predictive distribution of solution operators. The generative formulation of ICON (GenICON) captures the underlying uncertainty in the solution operator, which enables principled uncertainty quantification in the solution predictions in operator learning.


Nonnegative matrix factorization and the principle of the common cause

arXiv.org Machine Learning

--Nonnegative matrix factorization (NMF) is a known unsupervised data-reduction method. The principle of the common cause (PCC) is a basic methodological approach in probabilistic causality, which seeks an independent mixture model for the joint probability of two dependent random variables. It turns out that these two concepts are closely related. This relationship is explored reciprocally for several datasets of gray-scale images, which are conveniently mapped into probability models. On one hand, PCC provides a predictability tool that leads to a robust estimation of the effective rank of NMF . Unlike other estimates (e.g., those based on the Bayesian Information Criteria), our estimate of the rank is stable against weak noise. We show that NMF implemented around this rank produces features (basis images) that are also stable against noise and against seeds of local optimization, thereby effectively resolving the NMF nonidentifiability problem. On the other hand, NMF provides an interesting possibility of implementing PCC in an approximate way, where larger and positively correlated joint probabilities tend to be explained better via the independent mixture model. We work out a clustering method, where data points with the same common cause are grouped into the same cluster . We also show how NMF can be employed for data denoising. Nonnegative matrix factorization (NMF) was proposed and developed in data science [1]-[3].


Online Clustering of Seafloor Imagery for Interpretation during Long-Term AUV Operations

arXiv.org Artificial Intelligence

Abstract--As long-endurance and seafloor-resident AUVs become more capable, there is an increasing need for extended, real-time interpretation of seafloor imagery to enable adaptive missions and optimise communication efficiency. Although offline image analysis methods are well established, they rely on access to complete datasets and human-labelled examples to manage the strong influence of environmental and operational conditions on seafloor image appearance--requirements that cannot be met in real-time settings. T o address this, we introduce an online clustering framework (OCF) capable of interpreting seafloor imagery without supervision, that is designed to operate in real-time on continuous data streams in a scalable, adaptive, and self-consistent manner . The method enables the efficient review and consolidation of common patterns across the entire data history in constant time by identifying and maintaining a set of representative samples that capture the evolving feature distribution, supporting dynamic cluster merging and splitting without reprocessing the full image history. We evaluate the framework on three diverse seafloor image datasets, analysing the impact of different representative sampling strategies on both clustering accuracy and computational cost. The OCF achieves the highest average F1 score of 0.68 across the three datasets among all comparative online clustering approaches, with a standard deviation of 3% across three distinct survey trajectories, demonstrating its superior clustering capability and robustness to trajectory variation. In addition, it maintains consistently lower and bounded computational time as the data volume increases. Compared to offline clustering methods, it strikes a favourable balance between accuracy and efficiency. These properties are beneficial for generating survey data summaries and supporting informative path planning in long-term, persistent autonomous marine exploration.