Goto

Collaborating Authors

 Uncertainty


Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning

arXiv.org Artificial Intelligence

The standard theory of model-free reinforcement learning assumes that the environment dynamics are stationary and that agents are decoupled from their environment, such that policies are treated as being separate from the world they inhabit. This leads to theoretical challenges in the multi-agent setting where the non-stationarity induced by the learning of other agents demands prospective learning based on prediction models. To accurately model other agents, an agent must account for the fact that those other agents are, in turn, forming beliefs about it to predict its future behavior, motivating agents to model themselves as part of the environment. Here, building upon foundational work on universal artificial intelligence (AIXI), we introduce a mathematical framework for prospective learning and embedded agency centered on self-prediction, where Bayesian RL agents predict both future perceptual inputs and their own actions, and must therefore resolve epistemic uncertainty about themselves as part of the universe they inhabit. We show that in multi-agent settings, self-prediction enables agents to reason about others running similar algorithms, leading to new game-theoretic solution concepts and novel forms of cooperation unattainable by classical decoupled agents. Moreover, we extend the theory of AIXI, and study universally intelligent embedded agents which start from a Solomonoff prior. We show that these idealized agents can form consistent mutual predictions and achieve infinite-order theory of mind, potentially setting a gold standard for embedded multi-agent learning.


Bayesian Decentralized Decision-making for Multi-Robot Systems: Sample-efficient Estimation of Event Rates

arXiv.org Artificial Intelligence

Abstract-- Effective collective decision-making in swarm robotics often requires balancing exploration, communication and individual uncertainty estimation, especially in hazardous environments where direct measurements are limited or costly. We propose a decentralized Bayesian framework that enables a swarm of simple robots to identify the safer of two areas, each characterized by an unknown rate of hazardous events governed by a Poisson process. Robots employ a conjugate prior to gradually predict the times between events and derive confidence estimates to adapt their behavior . Our simulation results show that the robot swarm consistently chooses the correct area while reducing exposure to hazardous events by being sample-efficient. Compared to baseline heuristics, our proposed approach shows better performance in terms of safety and speed of convergence. The proposed scenario has potential to extend the current set of benchmarks in collective decision-making and our method has applications in adaptive risk-aware sampling and exploration in hazardous, dynamic environments. Collective decision-making under uncertainty is a fundamental challenge in multi-robot systems, including domains such as collective perception, environment classification, and spatial consensus [1]-[4]. Decentralized systems (e.g., robot swarms) operate under strict limitations on sensing, communication, and memory. Instead of sharing/storing complete observation histories, robots must maintain compact model representations of their knowledge. It is crucial to develop efficient strategies for collective decision-making, especially when observations are sparse, noisy [5], and gathered from stochastic processes [6]. This is typically characterized as a best-of-n problem [3], [7].


BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

Offline inverse reinforcement learning (IRL) aims to recover a reward function that explains expert behavior using only fixed demonstration data, without any additional online interaction. We propose BiCQL-ML, a policy-free offline IRL algorithm that jointly optimizes a reward function and a conservative Q-function in a bi-level framework, thereby avoiding explicit policy learning. The method alternates between (i) learning a conservative Q-function via Conservative Q-Learning (CQL) under the current reward, and (ii) updating the reward parameters to maximize the expected Q-values of expert actions while suppressing over-generalization to out-of-distribution actions. This procedure can be viewed as maximum likelihood estimation under a soft value matching principle. We provide theoretical guarantees that BiCQL-ML converges to a reward function under which the expert policy is soft-optimal. Empirically, we show on standard offline RL benchmarks that BiCQL-ML improves both reward recovery and downstream policy performance compared to existing offline IRL baselines.


Joint Estimation of Sea State and Vessel Parameters Using a Mass-Spring-Damper Equivalence Model

arXiv.org Artificial Intelligence

Real-time sea state estimation is vital for applications like shipbuilding and maritime safety. Traditional methods rely on accurate wave-vessel transfer functions to estimate wave spectra from onboard sensors. In contrast, our approach jointly estimates sea state and vessel parameters without needing prior transfer function knowledge, which may be unavailable or variable. We model the wave-vessel system using pseudo mass-spring-dampers and develop a dynamic model for the system. This method allows for recursive modeling of wave excitation as a time-varying input, relaxing prior works' assumption of a constant input. We derive statistically consistent process noise covariance and implement a square root cubature Kalman filter for sensor data fusion. Further, we derive the Posterior Cramer-Rao lower bound to evaluate estimator performance. Extensive Monte Carlo simulations and data from a high-fidelity validated simulator confirm that the estimated wave spectrum matches methods assuming complete transfer function knowledge.


Actionable and diverse counterfactual explanations incorporating domain knowledge and causal constraints

arXiv.org Artificial Intelligence

Counterfactual explanations enhance the actionable interpretability of machine learning models by identifying the minimal changes required to achieve a desired outcome of the model. However, existing methods often ignore the complex dependencies in real-world datasets, leading to unrealistic or impractical modifications. Motivated by cybersecurity applications in the email marketing domain, we propose a method for generating Diverse, Actionable, and kNowledge-Constrained Explanations (DANCE), which incorporates feature dependencies and causal constraints to ensure plausibility and real-world feasibility of counterfactuals. Our method learns linear and nonlinear constraints from data or integrates expert-provided dependency graphs, ensuring counterfactuals are plausible and actionable. By maintaining consistency with feature relationships, the method produces explanations that align with real-world constraints. Additionally, it balances plausibility, diversity, and sparsity, effectively addressing key limitations in existing algorithms. The work is developed based on a real-life case study with Freshmail, the largest email marketing company in Poland and supported by a joint R&D project Sendguard. Furthermore, we provide an extensive evaluation using 140 public datasets, which highlights its ability to generate meaningful, domain-relevant counterfactuals that outperform other existing approaches based on widely used metrics. The source code for reproduction of the results can be found in a GitHub repository we provide.


On Defining Neural Averaging

arXiv.org Artificial Intelligence

What does it even mean to average neural networks? We investigate the problem of synthesizing a single neural network from a collection of pretrained models, each trained on disjoint data shards, using only their final weights and no access to training data. In forming a definition of neural averaging, we take insight from model soup, which appears to aggregate multiple models into a singular model while enhancing generalization performance. In this work, we reinterpret model souping as a special case of a broader framework: Amortized Model Ensembling (AME) for neural averaging, a data-free meta-optimization approach that treats model differences as pseudogradients to guide neural weight updates. We show that this perspective not only recovers model soup but enables more expressive and adaptive ensembling strategies. Empirically, AME produces averaged neural solutions that outperform both individual experts and model soup baselines, especially in out-of-distribution settings. Our results suggest a principled and generalizable notion of data-free model weight aggregation and defines, in one sense, how to perform neural averaging.


ADNF-Clustering: An Adaptive and Dynamic Neuro-Fuzzy Clustering for Leukemia Prediction

arXiv.org Artificial Intelligence

Leukemia diagnosis and monitoring rely increasingly on high-throughput image data, yet conventional clustering methods lack the flexibility to accommodate evolving cellular patterns and quantify uncertainty in real time. We introduce Adaptive and Dynamic Neuro-Fuzzy Clustering, a novel streaming-capable framework that combines Convolutional Neural Network-based feature extraction with an online fuzzy clustering engine. ADNF initializes soft partitions via Fuzzy C-Means, then continuously updates micro-cluster centers, densities, and fuzziness parameters using a Fuzzy Temporal Index (FTI) that measures entropy evolution. A topology refinement stage performs density-weighted merging and entropy-guided splitting to guard against over- and under-segmentation. On the C-NMC leukemia microscopy dataset, our tool achieves a silhouette score of 0.51, demonstrating superior cohesion and separation over static baselines. The method's adaptive uncertainty modeling and label-free operation hold immediate potential for integration within the INFANT pediatric oncology network, enabling scalable, up-to-date support for personalized leukemia management.


A Dynamics-Informed Gaussian Process Framework for 2D Stochastic Navier-Stokes via Quasi-Gaussianity

arXiv.org Machine Learning

Yet a fundamental gap remains: while these methods depend critically on the choice of prior covariance kernel, most kernels are selected for computational convenience (e.g., Gaussian/RBF kernels) or generic smoothness assumptions (e.g., Mat ern) rather than being rigorously grounded in the system's long-time statistical structure. Recent breakthroughs in stochastic PDE theory now make it possible to close this gap, constructing priors directly from the invariant-measure geometry of the underlying dynamics. Recent work of Coe, Hairer, and Tolomeo [7] establishes a remarkable geometric property of the two-dimensional stochastic Navier-Stokes (2D SNS) equations: although the dynamics are highly nonlinear, their unique invariant measure is equivalent-in the sense of mutual absolute continuity-to the Gaussian invariant measure of the linearized Ornstein-Uhlenbeck (OU) process. Equivalence means the two measures share the same support, null sets, and typical events, differing only by a positive Radon-Nikodym derivative. This reveals that the equilibrium statistical geometry is Gaussian, even when individual realizations are not.


Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

arXiv.org Machine Learning

V ariational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models that would otherwise be intractable. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often rendering analytical treatment impossible and necessitating heavy reliance on approximate learning and inference techniques. Possibility theory, an imprecise probability framework, allows to directly model epistemic uncertainty instead of leveraging subjective probabilities. While this framework provides robustness and interpretability under sparse or imprecise information, adapting VI to the possibilistic setting requires rethinking core concepts such as entropy and divergence, which presuppose additivity. In this work, we develop a principled formulation of possibilistic variational inference and apply it to a special class of exponential-family functions, highlighting parallels with their probabilistic counterparts and revealing the distinctive mathematical structures of possibility theory.


Selecting Belief-State Approximations in Simulators with Latent States

arXiv.org Machine Learning

State resetting is a fundamental but often overlooked capability of simulators. It supports sample-based planning by allowing resets to previously encountered simulation states, and enables calibration of simulators using real data by resetting to states observed in real-system traces. While often taken for granted, state resetting in complex simulators can be nontrivial: when the simulator comes with latent variables (states), state resetting requires sampling from the posterior over the latent state given the observable history, a.k.a. the belief state (Silver and Veness, 2010). While exact sampling is often infeasible, many approximate belief-state samplers can be constructed, raising the question of how to select among them using only sampling access to the simulator. In this paper, we show that this problem reduces to a general conditional distribution-selection task and develop a new algorithm and analysis under sampling-only access. Building on this reduction, the belief-state selection problem admits two different formulations: latent state-based selection, which directly targets the conditional distribution of the latent state, and observation-based selection, which targets the induced distribution over the observation. Interestingly, these formulations differ in how their guarantees interact with the downstream roll-out methods: perhaps surprisingly, observation-based selection may fail under the most natural roll-out method (which we call Single-Reset) but enjoys guarantees under the less conventional alternative (which we call Repeated-Reset). Together with discussion on issues such as distribution shift and the choice of sampling policies, our paper reveals a rich landscape of algorithmic choices, theoretical nuances, and open questions, in this seemingly simple problem.