Goto

Collaborating Authors

 Markov Models


Colored Markov Random Fields for Probabilistic Topological Modeling

arXiv.org Machine Learning

Probabilistic Graphical Models (PGMs) encode conditional dependencies among random variables using a graph -nodes for variables, links for dependencies- and factorize the joint distribution into lower-dimensional components. This makes PGMs well-suited for analyzing complex systems and supporting decision-making. Recent advances in topological signal processing highlight the importance of variables defined on topological spaces in several application domains. In such cases, the underlying topology shapes statistical relationships, limiting the expressiveness of canonical PGMs. To overcome this limitation, we introduce Colored Markov Random Fields (CMRFs), which model both conditional and marginal dependencies among Gaussian edge variables on topological spaces, with a theoretical foundation in Hodge theory. CMRFs extend classical Gaussian Markov Random Fields by including link coloring: connectivity encodes conditional independence, while color encodes marginal independence. We quantify the benefits of CMRFs through a distributed estimation case study over a physical network, comparing it with baselines with different levels of topological prior.


Multi-Agent Reinforcement Learning with Communication-Constrained Priors

arXiv.org Artificial Intelligence

Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.


Prior preferences in active inference agents: soft, hard, and goal shaping

arXiv.org Artificial Intelligence

Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative drive, or what an agent wants to achieve, is formalised as the Kullback-Leibler divergence between a variational probability distribution, updated at each inference step, and a preference probability distribution that indicates what states or observations are more likely for the agent, hence determining the agent's goal in a certain environment. In the literature, the questions of how the preference distribution should be specified and of how a certain specification impacts inference and learning in an active inference agent have been given hardly any attention. In this work, we consider four possible ways of defining the preference distribution, either providing the agents with hard or soft goals and either involving or not goal shaping (i.e., intermediate goals). We compare the performances of four agents, each given one of the possible preference distributions, in a grid world navigation task. Our results show that goal shaping enables the best performance overall (i.e., it promotes exploitation) while sacrificing learning about the environment's transition dynamics (i.e., it hampers exploration).


A Multi-Agent, Policy-Gradient approach to Network Routing

arXiv.org Artificial Intelligence

Network routing is a distributed decision problem which naturally admits numerical performance measures, such as the average time for a packet to travel from source to destination. OLPOMDP, a policy-gradient reinforcement learning algorithm, was successfully applied to simulated network routing under a number of network models. Multiple distributed agents (routers) learned co-operative behavior without explicit inter-agent communication, and they avoided behavior which was individually desirable, but detrimental to the group's overall performance. Furthermore, shaping the reward signal by explicitly penalizing certain patterns of sub-optimal behavior was found to dramatically improve the convergence rate.


Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments

arXiv.org Artificial Intelligence

The deployment of multi-agent systems in dynamic, adversarial environments like robotic soccer necessitates real-time decision-making, sophisticated cooperation, and scalable algorithms to avoid the curse of dimensionality. While Reinforcement Learning (RL) offers a promising framework, existing methods often struggle with the multi-granularity of tasks (long-term strategy vs. instant actions) and the complexity of large-scale agent interactions. This paper presents a unified Multi-Agent Reinforcement Learning (MARL) framework that addresses these challenges. First, we establish a baseline using Proximal Policy Optimization (PPO) within a client-server architecture for real-time action scheduling, with PPO demonstrating superior performance (4.32 avg. goals, 82.9% ball control). Second, we introduce a Hierarchical RL (HRL) structure based on the options framework to decompose the problem into a high-level trajectory planning layer (modeled as a Semi-Markov Decision Process) and a low-level action execution layer, improving global strategy (avg. goals increased to 5.26). Finally, to ensure scalability, we integrate mean-field theory into the HRL framework, simplifying many-agent interactions into a single agent vs. the population average. Our mean-field actor-critic method achieves a significant performance boost (5.93 avg. goals, 89.1% ball control, 92.3% passing accuracy) and enhanced training stability. Extensive simulations of 4v4 matches in the Webots environment validate our approach, demonstrating its potential for robust, scalable, and cooperative behavior in complex multi-agent domains.


Risk-Entropic Flow Matching

arXiv.org Artificial Intelligence

Tilted (entropic) risk, obtained by applying a log-exponential transform to a base loss, is a well established tool in statistics and machine learning for emphasizing rare or high loss events while retaining a tractable optimization problem. In this work, our aim is to interpret its structure for Flow Matching (FM). FM learns a velocity field that transports samples from a simple source distribution to data by integrating an ODE. In rectified FM, training pairs are obtained by linearly interpolating between a source sample and a data sample, and a neural velocity field is trained to predict the straight line displacement using a mean squared error loss. This squared loss collapses all velocity targets that reach the same space-time point into a single conditional mean, thereby ignoring higher order conditional information (variance, skewness, multi-modality) that encodes fine geometric structure about the data manifold and minority branches. We apply the standard risk-sensitive (log-exponential) transform to the conditional FM loss and show that the resulting tilted risk loss is a natural upper-bound on a meaningful conditional entropic FM objective defined at each space-time point. Furthermore, we show that a small order expansion of the gradient of this conditional entropic objective yields two interpretable first order corrections: covariance preconditioning of the FM residual, and a skew tail term that favors asymmetric or rare branches. On synthetic data designed to probe ambiguity and tails, the resulting risk-sensitive loss improves statistical metrics and recovers geometric structure more faithfully than standard rectified FM.


Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning

arXiv.org Artificial Intelligence

Continual reinforcement learning (continual RL) seeks to formalize the notions of lifelong learning and endless adaptation in RL. In particular, the aim of continual RL is to develop RL agents that can maintain a careful balance between retaining useful information and adapting to new situations. To date, continual RL has been explored almost exclusively through the lens of risk-neutral decision-making, in which the agent aims to optimize the expected long-run performance. In this work, we present the first formal theoretical treatment of continual RL through the lens of risk-aware decision-making, in which the behaviour of the agent is directed towards optimizing a measure of long-run performance beyond the mean. In particular, we show that the classical theory of risk measures, widely used as a theoretical foundation in non-continual risk-aware RL, is, in its current form, incompatible with continual learning. Then, building on this insight, we extend risk measure theory into the continual setting by introducing a new class of ergodic risk measures that are compatible with continual learning. Finally, we provide a case study of risk-aware continual learning, along with empirical results, which show the intuitive appeal of ergodic risk measures in continual settings.


Can Artificial Intelligence solve the blockchain oracle problem? Unpacking the Challenges and Possibilities

arXiv.org Artificial Intelligence

The blockchain oracle problem, which refers to the challenge of injecting reliable external data into decentralized systems, remains a fundamental limitation to the development of trustless applications. While recent years have seen a proliferation of architectural, cryptographic, and economic strategies to mitigate this issue, no one has yet fully resolved the fundamental question of how a blockchain can gain knowledge about the off-chain world. In this position paper, we critically assess the role artificial intelligence (AI) can play in tackling the oracle problem. Drawing from both academic literature and practitioner implementations, we examine how AI techniques such as anomaly detection, language-based fact extraction, dynamic reputation modeling, and adversarial resistance can enhance oracle systems. We observe that while AI introduces powerful tools for improving data quality, source selection, and system resilience, it cannot eliminate the reliance on unverifiable off-chain inputs. Therefore, this study supports the idea that AI should be understood as a complementary layer of inference and filtering within a broader oracle design, not a substitute for trust assumptions.


Tempering the Bayes Filter towards Improved Model-Based Estimation

arXiv.org Machine Learning

Model-based filtering is often carried out while subject to an imperfect model, as learning partially-observable stochastic systems remains a challenge. Recent work on Bayesian inference found that tempering the likelihood or full posterior of an imperfect model can improve predictive accuracy, as measured by expected negative log likelihood. In this paper, we develop the tempered Bayes filter, improving estimation performance through both of the aforementioned, and one newly introduced, modalities. The result admits a recursive implementation with a computational complexity no higher than that of the original Bayes filter. Our analysis reveals that -- besides the well-known fact in the field of Bayesian inference that likelihood tempering affects the balance between prior and likelihood -- full-posterior tempering tunes the level of entropy in the final belief distribution. We further find that a region of the tempering space can be understood as interpolating between the Bayes- and MAP filters, recovering these as special cases. Analytical results further establish conditions under which a tempered Bayes filter achieves improved predictive performance. Specializing the results to the linear Gaussian case, we obtain the tempered Kalman filter. In this context, we interpret how the parameters affect the Kalman state estimate and covariance propagation. Empirical results confirm that our method consistently improves predictive accuracy over the Bayes filter baseline.


Unlocking the Power of Boltzmann Machines by Parallelizable Sampler and Efficient Temperature Estimation

arXiv.org Machine Learning

Boltzmann machines (BMs) are powerful energy-based generative models, but their heavy training cost has largely confined practical use to Restricted BMs (RBMs) trained with an efficient learning method called contrastive divergence. More accurate learning typically requires Markov chain Monte Carlo (MCMC) Boltzmann sampling, but it is time-consuming due to the difficulty of parallelization for more expressive models. To address this limitation, we first propose a new Boltzmann sampler inspired by a quantum-inspired combinatorial optimization called simulated bifurcation (SB). This SB-inspired approach, which we name Langevin SB (LSB), enables parallelized sampling while maintaining accuracy comparable to MCMC. Furthermore, this is applicable not only to RBMs but also to BMs with general couplings. However, LSB cannot control the inverse temperature of the output Boltzmann distribution, which hinders learning and degrades performance. To overcome this limitation, we also developed an efficient method for estimating the inverse temperature during the learning process, which we call conditional expectation matching (CEM). By combining LSB and CEM, we establish an efficient learning framework for BMs with greater expressive power than RBMs. We refer to this framework as sampler-adaptive learning (SAL). SAL opens new avenues for energy-based generative modeling beyond RBMs.