Goto

Collaborating Authors

 Uncertainty


FusWay: Multimodal hybrid fusion approach. Application to Railway Defect Detection

arXiv.org Artificial Intelligence

Multimodal fusion is a multimedia technique that has become popular in the wide range of tasks where image information is accompanied by a signal/audio. The latter may not convey highly semantic information, such as speech or music, but some measures such as audio signal recorded by mics in the goal to detect rail structure elements or defects. While classical detection approaches such as You Only Look Once (YOLO) family detectors can be efficiently deployed for defect detection on the image modality, the single modality approaches remain limited. They yield an overdetection in case of the appearance similar to normal structural elements. The paper proposes a new multimodal fusion architecture built on the basis of domain rules with YOLO and Vision transformer backbones. It integrates YOLOv8n for rapid object detection with a Vision Transformer (ViT) to combine feature maps extracted from multiple layers (7, 16, and 19) and synthesised audio representations for two defect classes: rail Rupture and Surface defect. Fusion is performed between audio and image. Experimental evaluation on a real-world railway dataset demonstrates that our multimodal fusion improves precision and overall accuracy by 0.2 points compared to the vision-only approach. Student's unpaired t-test also confirms statistical significance of differences in the mean accuracy.


Robust variational neural posterior estimation for simulation-based inference

arXiv.org Machine Learning

Recent advances in neural density estimation have enabled powerful simulation-based inference (SBI) methods that can flexibly approximate Bayesian inference for intractable stochastic models. Although these methods have demonstrated reliable posterior estimation when the simulator accurately represents the underlying data generative process (GDP), recent work has shown that they perform poorly in the presence of model misspecification. This poses a significant problem for their use on real-world problems, due to simulators always misrepresenting the true DGP to a certain degree. In this paper, we introduce robust variational neural posterior estimation (R VNP), a method which addresses the problem of misspecification in amortised SBI by bridging the simulation-to-reality gap using variational inference and error modelling. We test R VNP on multiple benchmark tasks, including using real data from astronomy, and show that it can recover robust posterior inference in a data-driven manner without adopting tunable hyperparameters or priors governing the misspecification.


Cryo-EM as a Stochastic Inverse Problem

arXiv.org Machine Learning

Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their ability to recover continuous structural variability. In this work, we formulate cryo-EM reconstruction as a stochastic inverse problem (SIP) over probability measures, where the observed images are modeled as the push-forward of an unknown distribution over molecular structures via a random forward operator. We pose the reconstruction problem as the minimization of a variational discrepancy between observed and simulated image distributions, using statistical distances such as the KL divergence and the Maximum Mean Discrepancy. The resulting optimization is performed over the space of probability measures via a Wasserstein gradient flow, which we numerically solve using particles to represent and evolve conformational ensembles. We validate our approach using synthetic examples, including a realistic protein model, which demonstrates its ability to recover continuous distributions over structural states. We analyze the connection between our formulation and Maximum A Posteriori (MAP) approaches, which can be interpreted as instances of the discretize-then-optimize (DTO) framework. We further provide a consistency analysis, establishing conditions under which DTO methods, such as MAP estimation, converge to the solution of the underlying infinite-dimensional continuous problem. Beyond cryo-EM, the framework provides a general methodology for solving SIPs involving random forward operators.


Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations

arXiv.org Machine Learning

In-context operator networks (ICON) are a class of operator learning methods based on the novel architectures of foundation models. Trained on a diverse set of datasets of initial and boundary conditions paired with corresponding solutions to ordinary and partial differential equations (ODEs and PDEs), ICON learns to map example condition-solution pairs of a given differential equation to an approximation of its solution operator. Here, we present a probabilistic framework that reveals ICON as implicitly performing Bayesian inference, where it computes the mean of the posterior predictive distribution over solution operators conditioned on the provided context, i.e., example condition-solution pairs. The formalism of random differential equations provides the probabilistic framework for describing the tasks ICON accomplishes while also providing a basis for understanding other multi-operator learning methods. This probabilistic perspective provides a basis for extending ICON to \emph{generative} settings, where one can sample from the posterior predictive distribution of solution operators. The generative formulation of ICON (GenICON) captures the underlying uncertainty in the solution operator, which enables principled uncertainty quantification in the solution predictions in operator learning.


Nonnegative matrix factorization and the principle of the common cause

arXiv.org Machine Learning

--Nonnegative matrix factorization (NMF) is a known unsupervised data-reduction method. The principle of the common cause (PCC) is a basic methodological approach in probabilistic causality, which seeks an independent mixture model for the joint probability of two dependent random variables. It turns out that these two concepts are closely related. This relationship is explored reciprocally for several datasets of gray-scale images, which are conveniently mapped into probability models. On one hand, PCC provides a predictability tool that leads to a robust estimation of the effective rank of NMF . Unlike other estimates (e.g., those based on the Bayesian Information Criteria), our estimate of the rank is stable against weak noise. We show that NMF implemented around this rank produces features (basis images) that are also stable against noise and against seeds of local optimization, thereby effectively resolving the NMF nonidentifiability problem. On the other hand, NMF provides an interesting possibility of implementing PCC in an approximate way, where larger and positively correlated joint probabilities tend to be explained better via the independent mixture model. We work out a clustering method, where data points with the same common cause are grouped into the same cluster . We also show how NMF can be employed for data denoising. Nonnegative matrix factorization (NMF) was proposed and developed in data science [1]-[3].


Data-Driven Stochastic Modeling Using Autoregressive Sequence Models: Translating Event Tables to Queueing Dynamics

arXiv.org Artificial Intelligence

While queueing network models are powerful tools for analyzing service systems, they traditionally require substantial human effort and domain expertise to construct. To make this modeling approach more scalable and accessible, we propose a data-driven framework for queueing network modeling and simulation based on autoregressive sequence models trained on event-stream data. Instead of explicitly specifying arrival processes, service mechanisms, or routing logic, our approach learns the conditional distributions of event types and event times, recasting the modeling task as a problem of sequence distribution learning. We show that Transformer-style architectures can effectively parameterize these distributions, enabling automated construction of high-fidelity simulators. As a proof of concept, we validate our framework on event tables generated from diverse queueing networks, showcasing its utility in simulation, uncertainty quantification, and counterfactual evaluation. Leveraging advances in artificial intelligence and the growing availability of data, our framework takes a step toward more automated, data-driven modeling pipelines to support broader adoption of queueing network models across service domains.


Online Identification of IT Systems through Active Causal Learning

arXiv.org Artificial Intelligence

Identifying a causal model of an IT system is fundamental to many branches of systems engineering and operation. Such a model can be used to predict the effects of control actions, optimize operations, diagnose failures, detect intrusions, etc., which is central to achieving the longstanding goal of automating network and system management tasks. Traditionally, causal models have been designed and maintained by domain experts. This, however, proves increasingly challenging with the growing complexity and dynamism of modern IT systems. In this paper, we present the first principled method for online, data-driven identification of an IT system in the form of a causal model. The method, which we call active causal learning, estimates causal functions that capture the dependencies among system variables in an iterative fashion using Gaussian process regression based on system measurements, which are collected through a rollout-based intervention policy. We prove that this method is optimal in the Bayesian sense and that it produces effective interventions. Experimental validation on a testbed shows that our method enables accurate identification of a causal system model while inducing low interference with system operations.


Hallucination Detection on a Budget: Efficient Bayesian Estimation of Semantic Entropy

arXiv.org Artificial Intelligence

Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only 53% of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.


Disentangling Interaction and Bias Effects in Opinion Dynamics of Large Language Models

arXiv.org Artificial Intelligence

Large Language Models are increasingly used to simulate human opinion dynamics, yet the effect of genuine interaction is often obscured by systematic biases. We present a Bayesian framework to disentangle and quantify three such biases: (i) a topic bias toward prior opinions in the training data; (ii) an agreement bias favoring agreement irrespective of the question; and (iii) an anchoring bias toward the initiating agent's stance. Applying this framework to multi-step dialogues reveals that opinion trajectories tend to quickly converge to a shared attractor, with the influence of the interaction fading over time, and the impact of biases differing between LLMs. In addition, we fine-tune an LLM on different sets of strongly opinionated statements (incl. misinformation) and demonstrate that the opinion attractor shifts correspondingly. Exposing stark differences between LLMs and providing quantitative tools to compare them to human subjects in the future, our approach highlights both chances and pitfalls in using LLMs as proxies for human behavior.


Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction

arXiv.org Machine Learning

Transductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and efficiency in transductive methods, where efficiency is measured by the size of the prediction sets. Specifically, we derive a strict finite-sample bound showing that any non-trivial confidence level leads to exponential growth in prediction set size for data with inherent uncertainty. The exponent scales linearly with the number of samples and is proportional to the conditional entropy of the data. Additionally, the bound includes a second-order term, dispersion, defined as the variance of the log conditional probability distribution. We show that this bound is achievable in an idealized setting. Finally, we examine a special case of transductive prediction where all test data points share the same label. We show that this scenario reduces to the hypothesis testing problem with empirically observed statistics and provide an asymptotically optimal confidence predictor, along with an analysis of the error exponent.