Goto

Collaborating Authors

 Undirected Networks


Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

arXiv.org Machine Learning

In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this. We addressed the issue within the framework of partially observable Markov Decision Process (POMDP) using a model-based method, in which the transition model is estimated from the incomplete and noisy observations using a newly proposed surrogate loss function with local approximation, while the policy and value function is learned with the help of belief imputation. For the latter purpose, a generative model is constructed and is seamlessly incorporated into the belief updating procedure of POMDP, which enables robust execution even under a significant incompleteness and noise. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several compared methods under various challenging scenarios.


The Value Function Polytope in Reinforcement Learning

arXiv.org Machine Learning

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.


Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

arXiv.org Artificial Intelligence

Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life's activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning.


Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning

arXiv.org Artificial Intelligence

We pose an active perception problem where an autonomous agent actively interacts with a second agent with potentially adversarial behaviors. Given the uncertainty in the intent of the other agent, the objective is to collect further evidence to help discriminate potential threats. The main technical challenges are the partial observability of the agent intent, the adversary modeling, and the corresponding uncertainty modeling. Note that an adversary agent may act to mislead the autonomous agent by using a deceptive strategy that is learned from past experiences. We propose an approach that combines belief space planning, generative adversary modeling, and maximum entropy reinforcement learning to obtain a stochastic belief space policy. By accounting for various adversarial behaviors in the simulation framework and minimizing the predictability of the autonomous agent's action, the resulting policy is more robust to unmodeled adversarial strategies. This improved robustness is empirically shown against an adversary that adapts to and exploits the autonomous agent's policy when compared with a standard Chance-Constraint Partially Observable Markov Decision Process robust approach.


NAIL: A General Interactive Fiction Agent

arXiv.org Artificial Intelligence

Interactive Fiction (IF) games are complex textual decision making problems. This paper introduces NAIL, an autonomous agent for general parser-based IF games. NAIL won the 2018 Text Adventure AI Competition, where it was evaluated on twenty unseen games. This paper describes the architecture, development, and insights underpinning NAIL's performance.


Markov Chain-based Cost-Optimal Control Charts for Healthcare Data

arXiv.org Machine Learning

Control charts have traditionally been used in industrial statistics, but are constantly seeing new areas of application, especially in the age of Industry 4.0. This paper introduces a new method, which is suitable for applications in the healthcare sector, especially for monitoring a health-characteristic of a patient. We adapt a Markov chain-based approach and develop a method in which not only the shift size (i.e. the degradation of the patient's health) can be random, but the effect of the repair (i.e. treatment) and time between samplings (i.e. visits) too. This means that we do not use many often-present assumptions which are usually not applicable for medical treatments. The average cost of the protocol, which is determined by the time between samplings and the control limit, can be estimated using the stationary distribution of the Markov chain. Furthermore, we incorporate the standard deviation of the cost into the optimisation procedure, which is often very important from a process control viewpoint. The sensitivity of the optimal parameters and the resulting average cost and cost standard deviation on different parameter values is investigated. We demonstrate the usefulness of the approach for real-life data of patients treated in Hungary: namely the monitoring of cholesterol level of patients with cardiovascular event risk. The results showed that the optimal parameters from our approach can be somewhat different from the original medical parameters.


State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis

arXiv.org Artificial Intelligence

Sequential data modeling and analysis have become indispensable tools for analyzing sequential data, such as time-series data, because larger amounts of sensed event data have become available. These methods capture the sequential structure of data of interest, such as input-output relations and correlation among datasets. However, because most studies in this area are specialized or limited to their respective applications, rigorous requirement analysis of such models has not been undertaken from a general perspective. Therefore, we particularly examine the structure of sequential data, and extract the necessity of `state duration' and `state interval' of events for efficient and rich representation of sequential data. Specifically addressing the hidden semi-Markov model (HSMM) that represents such state duration inside a model, we attempt to add representational capability of a state interval of events onto HSMM. To this end, we propose two extended models: an interval state hidden semi-Markov model (IS-HSMM) to express the length of a state interval with a special state node designated as "interval state node"; and an interval length probability hidden semi-Markov model (ILP-HSMM) which represents the length of the state interval with a new probabilistic parameter "interval length probability." Exhaustive simulations have revealed superior performance of the proposed models in comparison with HSMM. These proposed models are the first reported extensions of HMM to support state interval representation as well as state duration representation.


Sample-Optimal Parametric Q-Learning with Linear Transition Models

arXiv.org Machine Learning

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal policy using a sample size proportional to the feature dimension $K$ and invariant with respect to the size of the state space. To further improve its sample efficiency, we exploit the monotonicity property and intrinsic noise structure of the Bellman operator, provided the existence of anchor state-actions that imply implicit non-negativity in the feature space. We augment the algorithm using techniques of variance reduction, monotonicity preservation, and confidence bounds. It is proved to find a policy which is $\epsilon$-optimal from any initial state with high probability using $\widetilde{O}(K/\epsilon^2(1-\gamma)^3)$ sample transitions for arbitrarily large-scale MDP with a discount factor $\gamma\in(0,1)$. A matching information-theoretical lower bound is proved, confirming the sample optimality of the proposed method with respect to all parameters (up to polylog factors).


How AI could help you learn sign language

#artificialintelligence

Sign languages aren't easy to learn and are even harder to teach. They use not just hand gestures but also mouthings, facial expressions and body posture to communicate meaning. This complexity means professional teaching programs are still rare and often expensive. But this could all change soon, with a little help from artificial intelligence (AI). My colleagues and I are working on software for teaching yourself sign languages in an automated, intuitive way.


Bayesian Online Detection and Prediction of Change Points

arXiv.org Machine Learning

Online detection of instantaneous changes in the generative process of a data sequence generally focuses on retrospective inference of such change points without considering their future occurrences. We extend the Bayesian Online Change Point Detection algorithm to also infer the number of time steps until the next change point (i.e., the residual time). This enables us to handle observation models which depend on the total segment duration, which is useful to model data sequences with temporal scaling. In addition, we extend the model by removing the i.i.d. assumption on the observation model parameters. The resulting inference algorithm for segment detection can be deployed in an online fashion, and we illustrate applications to synthetic and to two medical real-world data sets.