Goto

Collaborating Authors

 Markov Models


Approximate learning of parsimonious Bayesian context trees

arXiv.org Machine Learning

Models for categorical sequences typically assume exchangeable or first-order dependent sequence elements. These are common assumptions, for example, in models of computer malware traces and protein sequences. Although such simplifying assumptions lead to computational tractability, these models fail to capture long-range, complex dependence structures that may be harnessed for greater predictive power. To this end, a Bayesian modelling framework is proposed to parsimoniously capture rich dependence structures in categorical sequences, with memory efficiency suitable for real-time processing of data streams. Parsimonious Bayesian context trees are introduced as a form of variable-order Markov model with conjugate prior distributions. The novel framework requires fewer parameters than fixed-order Markov models by dropping redundant dependencies and clustering sequential contexts. Approximate inference on the context tree structure is performed via a computationally efficient model-based agglomerative clustering procedure. The proposed framework is tested on synthetic and real-world data examples, and it outperforms existing sequence models when fitted to real protein sequences and honeypot computer terminal sessions.


Online Planning in POMDPs with State-Requests

arXiv.org Artificial Intelligence

In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observability. For this scenario, we propose AEMS-SR (Anytime Error Minimization Search with State Requests), a principled online planning algorithm tailored for POMDPs with state requests. By representing the search space as a graph instead of a tree, AEMS-SR avoids the exponential growth of the search space originating from state requests. Theoretical analysis demonstrates AEMS-SR's $\varepsilon$-optimality, ensuring solution quality, while empirical evaluations illustrate its effectiveness compared with AEMS and POMCP, two SOTA online planning algorithms. AEMS-SR enables efficient planning in domains characterized by partial observability and costly state requests offering practical benefits across various applications.


Reinforcement Learning for Sustainable Energy: A Survey

arXiv.org Machine Learning

The transition to sustainable energy is a key challenge of our time, requiring modifications in the entire pipeline of energy production, storage, transmission, and consumption. At every stage, new sequential decision-making challenges emerge, ranging from the operation of wind farms to the management of electrical grids or the scheduling of electric vehicle charging stations. All such problems are well suited for reinforcement learning, the branch of machine learning that learns behavior from data. Therefore, numerous studies have explored the use of reinforcement learning for sustainable energy. This paper surveys this literature with the intention of bridging both the underlying research communities: energy and machine learning. After a brief introduction of both fields, we systematically list relevant sustainability challenges, how they can be modeled as a reinforcement learning problem, and what solution approaches currently exist in the literature. Afterwards, we zoom out and identify overarching reinforcement learning themes that appear throughout sustainability, such as multi-agent, offline, and safe reinforcement learning. Lastly, we also cover standardization of environments, which will be crucial for connecting both research fields, and highlight potential directions for future work. In summary, this survey provides an extensive overview of reinforcement learning methods for sustainable energy, which may play a vital role in the energy transition.


SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments

arXiv.org Artificial Intelligence

This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervision is a challenge. Two algorithms, PPOEM and SOAP, are proposed and studied in depth to address this problem. PPOEM applies the forward-backward algorithm (for Hidden Markov Models) to optimize the expected returns for an option-augmented policy. However, this learning approach is unstable during on-policy rollouts. It is also unsuited for learning causal policies without the knowledge of future trajectories, since option assignments are optimized for offline sequences where the entire episode is available. As an alternative approach, SOAP evaluates the policy gradient for an optimal option assignment. It extends the concept of the generalized advantage estimation (GAE) to propagate option advantages through time, which is an analytical equivalent to performing temporal back-propagation of option policy gradients. This option policy is only conditional on the history of the agent, not future actions. Evaluated against competing baselines, SOAP exhibited the most robust performance, correctly discovering options for POMDP corridor environments, as well as on standard benchmarks including Atari and MuJoCo, outperforming PPOEM, as well as LSTM and Option-Critic baselines. The open-sourced code is available at https://github.com/shuishida/SoapRL.


Log-Concave Coupling for Sampling Neural Net Posteriors

arXiv.org Machine Learning

In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors $w$ of dimension $d$ is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for $w$ with an auxiliary random variable $\xi$. The resulting reverse conditional $w|\xi$ of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on $w$ with suitably small variance, the resulting marginal density of the auxiliary variable $\xi$ is proven to be strictly log concave for all dimensions $d$. For a uniform prior on the unit $\ell_1$ ball, evidence is given that the density of $\xi$ is again strictly log concave for sufficiently large $d$. The score of the marginal density of the auxiliary random variable $\xi$ is determined by an expectation over $w|\xi$ and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of $\xi$ permits methods of sampling $\xi$ by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of $\xi$ is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from $w|\xi$, produces neuron weights $w$ whose marginal distribution is from the desired posterior.


Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind

arXiv.org Artificial Intelligence

Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by ou r beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low - level mental states (e.g., intentions, g oals) and attributing high - level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed - forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs - driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.


Principal-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Contracts are the economic framework which allows a principal to delegate a task to an agent -- despite misaligned interests, and even without directly observing the agent's actions. In many modern reinforcement learning settings, self-interested agents learn to perform a multi-stage task delegated to them by a principal. We explore the significant potential of utilizing contracts to incentivize the agents. We model the delegated task as an MDP, and study a stochastic game between the principal and agent where the principal learns what contracts to use, and the agent learns an MDP policy in response. We present a learning-based algorithm for optimizing the principal's contracts, which provably converges to the subgame-perfect equilibrium of the principal-agent game. A deep RL implementation allows us to apply our method to very large MDPs with unknown transition dynamics. We extend our approach to multiple agents, and demonstrate its relevance to resolving a canonical sequential social dilemma with minimal intervention to agent rewards.


Estimating the number of clusters of a Block Markov Chain

arXiv.org Machine Learning

Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with Markovian dependence, we show that the method is asymptotically consistent - in spite of the dependencies between the count matrix's entries, and even when the count matrix is sparse. We also present a numerical evaluation of our method, and compare it to alternatives.


Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

arXiv.org Artificial Intelligence

Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These traditional assumptions frequently prove inadequate in real-world settings, thereby restricting the capability of current OPE methods to effectively address complex interference effects. In response, we advocate for the implementation of the permutation invariance (PI) assumption. This innovative approach enables the data-driven, adaptive learning of the mean-field function, offering a more flexible estimation method beyond conventional averaging. Furthermore, we present novel algorithms that incorporate the PI assumption into OPE and thoroughly examine their theoretical foundations. Our numerical analyses demonstrate that this novel approach yields significantly more precise estimations than existing baseline algorithms, thereby substantially improving the practical applicability and effectiveness of OPE methodologies.


Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN

arXiv.org Artificial Intelligence

Resilience is defined as the ability of a network to resist, adapt, and quickly recover from disruptions, and to continue to maintain an acceptable level of services from users' perspective. With the advent of future radio networks, including advanced 5G and upcoming 6G, critical services become integral to future networks, requiring uninterrupted service delivery for end users. Unfortunately, with the growing network complexity, user mobility and diversity, it becomes challenging to scale current resilience management techniques that rely on local optimizations to large dense network deployments. This paper aims to address this problem by globally optimizing the resilience of a dense multi-cell network based on multi-agent deep reinforcement learning. Specifically, our proposed solution can dynamically tilt cell antennas and reconfigure transmit power to mitigate outages and increase both coverage and service availability. A multi-objective optimization problem is formulated to simultaneously satisfy resiliency constraints while maximizing the service quality in the network area in order to minimize the impact of outages on neighbouring cells. Extensive simulations then demonstrate that with our proposed solution, the average service availability in terms of user throughput can be increased by up to 50-60% on average, while reaching a coverage availability of 99% in best cases.