Goto

Collaborating Authors

 Markov Models


Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

arXiv.org Artificial Intelligence

Real-world reinforcement learning tasks often involve some form of partial observability where the observations only give a partial or noisy view of the true state of the world. Such tasks typically require some form of memory, where the agent has access to multiple past observations, in order to perform well. One popular way to incorporate memory is by using a recurrent neural network to access the agent's history. However, recurrent neural networks in reinforcement learning are often fragile and difficult to train, susceptible to catastrophic forgetting and sometimes fail completely as a result. In this work, we propose Deep Transformer Q-Networks (DTQN), a novel architecture utilizing transformers and self-attention to encode an agent's history. DTQN is designed modularly, and we compare results against several modifications to our base model. Our experiments demonstrate the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.


Quantifying Complexity: An Object-Relations Approach to Complex Systems

arXiv.org Artificial Intelligence

The best way to model, understand, and quantify the information contained in complex systems is an open question in physics, mathematics, and computer science. The uncertain relationship between entropy and complexity further complicates this question. With ideas drawn from the object-relations theory of psychology, this paper develops an object-relations model of complex systems which generalizes to systems of all types, including mathematical operations, machines, biological organisms, and social structures. The resulting Complex Information Entropy (CIE) equation is a robust method to quantify complexity across various contexts. The paper also describes algorithms to iteratively update and improve approximate solutions to the CIE equation, to recursively infer the composition of complex systems, and to discover the connections among objects across different lengthscales and timescales. Applications are discussed in the fields of engineering design, atomic and molecular physics, chemistry, materials science, neuroscience, psychology, sociology, ecology, economics, and medicine.


Active Exploration via Experiment Design in Markov Chains

arXiv.org Artificial Intelligence

A key challenge in science and engineering is to design experiments to learn about some unknown quantity of interest. Classical experimental design optimally allocates the experimental budget to maximize a notion of utility (e.g., reduction in uncertainty about the unknown quantity). We consider a rich setting, where the experiments are associated with states in a {\em Markov chain}, and we can only choose them by selecting a {\em policy} controlling the state transitions. This problem captures important applications, from exploration in reinforcement learning to spatial monitoring tasks. We propose an algorithm -- \textsc{markov-design} -- that efficiently selects policies whose measurement allocation \emph{provably converges to the optimal one}. The algorithm is sequential in nature, adapting its choice of policies (experiments) informed by past measurements. In addition to our theoretical analysis, we showcase our framework on applications in ecological surveillance and pharmacology.


Foundation Models for Semantic Novelty in Reinforcement Learning

arXiv.org Artificial Intelligence

Effectively exploring the environment is a key challenge in reinforcement learning (RL). We address this challenge by defining a novel intrinsic reward based on a foundation model, such as contrastive language image pretraining (CLIP), which can encode a wealth of domain-independent semantic visual-language knowledge about the world. Specifically, our intrinsic reward is defined based on pre-trained CLIP embeddings without any fine-tuning or learning on the target RL task. We demonstrate that CLIP-based intrinsic rewards can drive exploration towards semantically meaningful states and outperform state-of-the-art methods in challenging sparse-reward procedurally-generated environments.


Heterogeneous Hidden Markov Models for Sleep Activity Recognition from Multi-Source Passively Sensed Data

arXiv.org Artificial Intelligence

Psychiatric patients' passive activity monitoring is crucial to detect behavioural shifts in real-time, comprising a tool that helps clinicians supervise patients' evolution over time and enhance the associated treatments' outcomes. Frequently, sleep disturbances and mental health deterioration are closely related, as mental health condition worsening regularly entails shifts in the patients' circadian rhythms. Therefore, Sleep Activity Recognition constitutes a behavioural marker to portray patients' activity cycles and to detect behavioural changes among them. Moreover, mobile passively sensed data captured from smartphones, thanks to these devices' ubiquity, constitute an excellent alternative to profile patients' biorhythm. In this work, we aim to identify major sleep episodes based on passively sensed data. To do so, a Heterogeneous Hidden Markov Model is proposed to model a discrete latent variable process associated with the Sleep Activity Recognition task in a self-supervised way. We validate our results against sleep metrics reported by clinically tested wearables, proving the effectiveness of the proposed approach.


Formalizing the Problem of Side Effect Regularization

arXiv.org Artificial Intelligence

AI objectives are often hard to specify properly. Some approaches tackle this problem by regularizing the AI's side effects: Agents must weigh off "how much of a mess they make" with an imperfectly specified proxy objective. We propose a formal criterion for side effect regularization via the assistance game framework. In these games, the agent solves a partially observable Markov decision process (POMDP) representing its uncertainty about the objective function it should optimize. We consider the setting where the true objective is revealed to the agent at a later time step. We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks. We empirically demonstrate the reasonableness of our problem formalization via ground-truth evaluation in two gridworld environments.


Learning to Follow Instructions in Text-Based Games

arXiv.org Artificial Intelligence

Text-based games present a unique class of sequential decision making problem in which agents interact with a partially observable, simulated environment via actions and observations conveyed through natural language. Such observations typically include instructions that, in a reinforcement learning (RL) setting, can directly or indirectly guide a player towards completing reward-worthy tasks. In this work, we study the ability of RL agents to follow such instructions. We conduct experiments that show that the performance of state-of-the-art text-based game agents is largely unaffected by the presence or absence of such instructions, and that these agents are typically unable to execute tasks to completion. To further study and address the task of instruction following, we equip RL agents with an internal structured representation of natural language instructions in the form of Linear Temporal Logic (LTL), a formal language that is increasingly used for temporally extended reward specification in RL. Our framework both supports and highlights the benefit of understanding the temporal semantics of instructions and in measuring progress towards achievement of such a temporally extended behaviour. Experiments with 500+ games in TextWorld demonstrate the superior performance of our approach.


A Survey on Quantum Reinforcement Learning

arXiv.org Artificial Intelligence

With recent advances in the fabrication and control of hardware for quantum information processing, the possibilities of merging quantum computing (QC) with machine learning (ML) have received a huge amount of attention within the growing research community. Hereby, reinforcement learning (RL) is the third paradigm besides supervised and unsupervised learning. In this survey article, we provide an overview over so-called quantum reinforcement learning (QRL) algorithms. We understand these as quantum-assisted approaches, that solve a particular task (be they classical or quantum in nature) by employing quantum resources (either in simulation and/or in experiment). In order to keep this contribution as self-contained as possible, we provide the necessary backgrounds before venturing into the QRL literature. We start out with a brief recap of the essentials of the RL paradigm in the fully classical setting in Sec. 2. Further, in Sec. 3 we provide a quick introduction to QC and variational quantum circuits (VQCs). Readers familiar with either of the topics may safely skip these sections. In Sec. 4 we turn our attention to the emerging field of QRL, starting out with a quick overview of the literature.


Challenges and Opportunities in Deep Reinforcement Learning with Graph Neural Networks: A Comprehensive review of Algorithms and Applications

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) has empowered a variety of artificial intelligence fields, including pattern recognition, robotics, recommendation-systems, and gaming. Similarly, graph neural networks (GNN) have also demonstrated their superior performance in supervised learning for graph-structured data. In recent times, the fusion of GNN with DRL for graph-structured environments has attracted a lot of attention. This paper provides a comprehensive review of these hybrid works. These works can be classified into two categories: (1) algorithmic enhancement, where DRL and GNN complement each other for better utility; (2) application-specific enhancement, where DRL and GNN support each other. This fusion effectively addresses various complex problems in engineering and life sciences. Based on the review, we further analyze the applicability and benefits of fusing these two domains, especially in terms of increasing generalizability and reducing computational complexity. Finally, the key challenges in integrating DRL and GNN, and potential future research directions are highlighted, which will be of interest to the broader machine learning community.


Reinforcement Learning with Stepwise Fairness Constraints

arXiv.org Artificial Intelligence

Decision making systems trained with real-world data are deployed ubiquitously in our daily life, for example, in regard to credit, education, and medical care. However, those decision systems may demonstrate discrimination against disadvantaged groups due to the biases in the data [16]. In order to mitigate this issue, many have proposed to impose fairness constraints [16, 20] on the decision, such that certain statistical parity properties are achieved. Despite the fact that fair learning has been extensively studied, most of this work is in the static setting without considering the sequential feedback effects of decisions. At the same time, in many scenarios, algorithmic decisions may incur changes in the underlying features or qualification status of individuals, which further feeds back to the decision making process; for example, banks' decision may induce borrowers to react, for example changing their FICO score by closing credit cards.