Goto

Collaborating Authors

 Markov Models


Lecture Notes on Partially Known MDPs

arXiv.org Artificial Intelligence

In these notes we will tackle the problem of finding optimal policies for Markov decision processes (MDPs) which are not fully known to us. Our intention is to slowly transition from an offline setting to an online (learning) setting. Namely, we are moving towards reinforcement learning. As a reminder, a (stationary) MDP M is a 4-tuple (S,A,P,r) where: - S is a finite set of states, - A is a finite set of actions. For intuition, this is just a graph-based way of saying that the reachable part of the Markov chain induced by the policy has a single closed communicating class. Let M be a communicating MDP and i one of its states.


ED2: An Environment Dynamics Decomposition Framework for World Model Construction

arXiv.org Artificial Intelligence

Model-based reinforcement learning methods achieve significant sample efficiency in many tasks, but their performance is often limited by the existence of the model error. To reduce the model error, previous works use a single well-designed network to fit the entire environment dynamics, which treats the environment dynamics as a black box. However, these methods lack to consider the environmental decomposed property that the dynamics may contain multiple sub-dynamics, which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error and boosts the performance of the state-of-the-art MBRL algorithms on various tasks.


Understanding Dynamic Spatio-Temporal Contexts in Long Short-Term Memory for Road Traffic Speed Prediction

arXiv.org Artificial Intelligence

Reliable traffic flow prediction is crucial to creating intelligent transportation systems. Many big-data-based prediction approaches have been developed but they do not reflect complicated dynamic interactions between roads considering time and location. In this study, we propose a dynamically localised long short-term memory (LSTM) model that involves both spatial and temporal dependence between roads. To do so, we use a localised dynamic spatial weight matrix along with its dynamic variation. Moreover, the LSTM model can deal with sequential data with long dependency as well as complex non-linear features. Empirical results indicated superior prediction performances of the proposed model compared to two different baseline methods.


Analyzing Patient Trajectories With Artificial Intelligence

#artificialintelligence

For example, electronic health records store the history of a patient's diagnoses, medications, laboratory values, and treatment plans [1-3]. Wearables collect granular sensor measurements of various neurophysiological body functions over time [4-6]. Intensive care units (ICUs) monitor disease progression via continuous physiological measurements (eg, electrocardiograms) [7-10]. As a result, patient data in digital medicine are regularly of longitudinal form (ie, consisting of health events from multiple time points) and thus form patient trajectories. Analyzing patient trajectories provides opportunities for more effective care in digital medicine [2,7,11]. Patient trajectories encode rich information on the history of health states that are also predictive of the future course of a disease (eg, individualized differences in disease progression or responsiveness to medications) [9,10,12]. As such, it is possible to construct patient trajectories that capture the entire disease course and characterize the many possible disease progression patterns, such as recurrent, stable, or rapidly deteriorating disease states (Figure 1). Hence, modeling the patient trajectories allows one to build robust models of diseases that capture disease dynamics seen in patient trajectories. Here, we replace disease models with data from only a single or a small number of time points by disease models that account for the longitudinal nature of patient trajectories, thus offering vast potential for digital medicine. Several studies have previously introduced artificial intelligence (AI) in medicine for practitioners [13,14].


Learning a Robust Multiagent Driving Policy for Traffic Congestion Reduction

arXiv.org Artificial Intelligence

The advent of automated and autonomous vehicles (AVs) creates opportunities to achieve system-level goals using multiple AVs, such as traffic congestion reduction. Past research has shown that multiagent congestion-reducing driving policies can be learned in a variety of simulated scenarios. While initial proofs of concept were in small, closed traffic networks with a centralized controller, recently successful results have been demonstrated in more realistic settings with distributed control policies operating in open road networks where vehicles enter and leave. However, these driving policies were mostly tested under the same conditions they were trained on, and have not been thoroughly tested for robustness to different traffic conditions, which is a critical requirement in real-world scenarios. This paper presents a learned multiagent driving policy that is robust to a variety of open-network traffic conditions, including vehicle flows, the fraction of AVs in traffic, AV placement, and different merging road geometries. A thorough empirical analysis investigates the sensitivity of such a policy to the amount of AVs in both a simple merge network and a more complex road with two merging ramps. It shows that the learned policy achieves significant improvement over simulated human-driven policies even with AV penetration as low as 2%. The same policy is also shown to be capable of reducing traffic congestion in more complex roads with two merging ramps.


Sample Complexity of Robust Reinforcement Learning with a Generative Model

arXiv.org Machine Learning

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\epsilon$-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.


Maximum Entropy Model-based Reinforcement Learning

arXiv.org Artificial Intelligence

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offers a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exists for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of the model-based algorithm Dreamer.


Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

arXiv.org Artificial Intelligence

Modeling the world can benefit robot learning by providing a rich training signal for shaping an agent's latent state space. However, learning world models in unconstrained environments over high-dimensional observation spaces such as images is challenging. One source of difficulty is the presence of irrelevant but hard-to-model background distractions, and unimportant visual details of task-relevant entities. We address this issue by learning a recurrent latent dynamics model which contrastively predicts the next observation. This simple model leads to surprisingly robust robotic control even with simultaneous camera, background, and color distractions. We outperform alternatives such as bisimulation methods which impose state-similarity measures derived from divergence in future reward or future optimal actions. We obtain state-of-the-art results on the Distracting Control Suite, a challenging benchmark for pixel-based robotic control.


Unsupervised Machine Learning Hidden Markov Models in Python

#artificialintelligence

Created by Lazy Programmer Inc. English [Auto-generated], Portuguese [Auto-generated] Preview this Udemy Course - GET COUPON CODE Description The Hidden Markov Model or HMM is all about learning sequences. A lot of the data that would be very useful for us to model is in sequences. Stock prices are sequences of prices. Language is a sequence of words. Credit scoring involves sequences of borrowing and repaying money, and we can use those sequences to predict whether or not you're going to default.


On the challenges of using D-Wave computers to sample Boltzmann Random Variables

arXiv.org Machine Learning

Sampling random variables following a Boltzmann distribution is an NP-hard problem involved in various applications such as training of \textit{Boltzmann machines}, a specific kind of neural network. Several attempts have been made to use a D-Wave quantum computer to sample such a distribution, as this could lead to significant speedup in these applications. Yet, at present, several challenges remain to efficiently perform such sampling. We detail the various obstacles and explain the remaining difficulties in solving the sampling problem on a D-wave machine.