Markov Models
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery
Ondel, Lucas, Vydana, Hari Krishna, Burget, Lukáš, Černocký, Jan
This work tackles the problem of learning a set of language specific acoustic units from unlabeled speech recordings given a set of labeled recordings from other languages. Our approach may be described by the following two steps procedure: first the model learns the notion of acoustic units from the labelled data and then the model uses its knowledge to find new acoustic units on the target language. We implement this process with the Bayesian Subspace Hidden Markov Model (SHMM), a model akin to the Subspace Gaussian Mixture Model (SGMM) where each low dimensional embedding represents an acoustic unit rather than just a HMM's state. The subspace is trained on 3 languages from the GlobalPhone corpus (German, Polish and Spanish) and the AUs are discovered on the TIMIT corpus. Results, measured in equivalent Phone Error Rate, show that this approach significantly outperforms previous HMM based acoustic units discovery systems and compares favorably with the Variational Auto Encoder-HMM.
Reinforced Imitation in Heterogeneous Action Space
Zolna, Konrad, Rostamzadeh, Negar, Bengio, Yoshua, Ahn, Sungjin, Pinheiro, Pedro O.
Imitation learning is an effective alternative approach to learn a policy when the reward function is sparse. In this paper, we consider a challenging setting where an agent and an expert use different actions from each other. We assume that the agent has access to a sparse reward function and state-only expert observations. We propose a method which gradually balances between the imitation learning cost and the reinforcement learning objective. In addition, this method adapts the agent's policy based on either mimicking expert behavior or maximizing sparse reward. We show, through navigation scenarios, that (i) an agent is able to efficiently leverage sparse rewards to outperform standard state-only imitation learning, (ii) it can learn a policy even when its actions are different from the expert, and (iii) the performance of the agent is not bounded by that of the expert, due to the optimized usage of sparse rewards.
Reinforcement Learning with Attention that Works: A Self-Supervised Approach
Manchin, Anthony, Abbasnejad, Ehsan, Hengel, Anton van den
Attention models have had a significant positive impact on deep learning across a range of tasks. However previous attempts at integrating attention with reinforcement learning have failed to produce significant improvements. We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment. Unlike the selective attention models used in previous attempts, which constrain the attention via preconceived notions of importance, our implementation utilises the Markovian properties inherent in the state input. Our method produces a faithful visualisation of the policy, focusing on the behaviour of the agent. Our experiments demonstrate that the trained policies use multiple simultaneous foci of attention, and are able to modulate attention over time to deal with situations of partial observability.
Combining Offline Models and Online Monte-Carlo Tree Search for Planning from Scratch
Planning in stochastic and partially observable environments is a central issue in artificial intelligence. One commonly used technique for solving such a problem is by constructing an accurate model firstly. Although some recent approaches have been proposed for learning optimal behaviour under model uncertainty, prior knowledge about the environment is still needed to guarantee the performance of the proposed algorithms. With the benefits of the Predictive State Representations (PSRs) approach for state representation and model prediction, in this paper, we introduce an approach for planning from scratch, where an offline PSR model is firstly learned and then combined with online Monte-Carlo tree search for planning with model uncertainty. By comparing with the state-of-the-art approach of planning with model uncertainty, we demonstrated the effectiveness of the proposed approaches along with the proof of their convergence. The effectiveness and scalability of our proposed approach are also tested on the RockSample problem, which are infeasible for the state-of-the-art BA-POMDP based approaches.
Goodness of Fit Testing for Dynamic Networks
Magner, Abram, Szpankowski, Wojciech
Numerous networks in the real world change over time, in the sense that nodes and edges enter and leave the networks. Various dynamic random graph models have been proposed to explain the macroscopic properties of these systems and to provide a foundation for statistical inferences and predictions. It is of interest to have a rigorous way to determine how well these models match observed networks. We thus ask the following goodness of fit question: given a sequence of observations/snapshots of a growing random graph, along with a candidate model $M$, can we determine whether the snapshots came from $M$ or from some arbitrary alternative model that is well-separated from $M$ in some natural metric? We formulate this problem precisely and boil it down to goodness of fit testing for graph-valued, infinite-state Markov processes and exhibit and analyze a test based on a procedure that we call non-stationary sampling for a natural class of models.
Diversified Hidden Markov Models for Sequential Labeling
Qiao, Maoying, Bian, Wei, Xu, Richard Yida, Tao, Dacheng
Labeling of sequential data is a prevalent meta-problem for a wide range of real world applications. While the first-order Hidden Markov Models (HMM) provides a fundamental approach for unsupervised sequential labeling, the basic model does not show satisfying performance when it is directly applied to real world problems, such as part-of-speech tagging (PoS tagging) and optical character recognition (OCR). Aiming at improving performance, important extensions of HMM have been proposed in the literatures. One of the common key features in these extensions is the incorporation of proper prior information. In this paper, we propose a new extension of HMM, termed diversified Hidden Markov Models (dHMM), which utilizes a diversity-encouraging prior over the state-transition probabilities and thus facilitates more dynamic sequential labellings. Specifically, the diversity is modeled by a continuous determinantal point process prior, which we apply to both unsupervised and supervised scenarios. Learning and inference algorithms for dHMM are derived. Empirical evaluations on benchmark datasets for unsupervised PoS tagging and supervised OCR confirmed the effectiveness of dHMM, with competitive performance to the state-of-the-art.
Artificial Intelligence and Game Theory Models for Defending Critical Networks with Cyber Deception
Fugate, Sunny (Space and Naval Warfare Systems Center Pacific) | Ferguson-Walter, Kimberly (US Department of Defense)
Traditional cyber security techniques have led to an asymmetric disadvantage for defenders. The defender must detect all possible threats at all times from all attackers and defend all systems against all possible exploitation. In contrast, an attacker needs only to find a single path to the defender’s critical information. In this article, we discuss how this asymmetry can be rebalanced using cyber deception to change the attacker’s perception of the network environment, and lead attackers to false beliefs about which systems contain critical information or are critical to a defender’s computing infrastructure. We introduce game theory concepts and models to represent and reason over the use of cyber deception by the defender and the effect it has on attacker perception. Finally, we discuss techniques for combining artificial intelligence algorithms with game theory models to estimate hidden states of the attacker using feedback through payoffs to learn how best to defend the system using cyber deception. It is our opinion that adaptive cyber deception is a necessary component of future information systems and networks. The techniques we present can simultaneously decrease the risks and impacts suffered by defenders and dramatically increase the costs and risks of detection for attackers. Such techniques are likely to play a pivotal role in defending national and international security concerns.
Online Risk-Bounded Motion Planning for Autonomous Vehicles in Dynamic Environments
Huang, Xin, Hong, Sungkweon, Hofmann, Andreas, Williams, Brian C.
A crucial challenge to efficient and robust motion planning for autonomous vehicles is understanding the intentions of the surrounding agents. Ignoring the intentions of the other agents in dynamic environments can lead to risky or over-conservative plans. In this work, we model the motion planning problem as a partially observable Markov decision process (POMDP) and propose an online system that combines an intent recognition algorithm and a POMDP solver to generate risk-bounded plans for the ego vehicle navigating with a number of dynamic agent vehicles. The intent recognition algorithm predicts the probabilistic hybrid motion states of each agent vehicle over a finite horizon using Bayesian filtering and a library of pre-learned maneuver motion models. We update the POMDP model with the intent recognition results in real time and solve it using a heuristic search algorithm which produces policies with upper-bound guarantees on the probability of near colliding with other dynamic agents. We demonstrate that our system is able to generate better motion plans in terms of efficiency and safety in a number of challenging environments including unprotected intersection left turns and lane changes as compared to the baseline methods.
Reinforcement Learning Demystified: Markov Decision Processes (Part 1)
In the previous blog post we talked about reinforcement learning and its characteristics. We mentioned the process of the agent observing the environment output consisting of a reward and the next state, and then acting upon that. This whole process is a Markov Decision Process or an MDP for short. This blog post is a bit mathy. Grab your coffee and a comfortable chair, and just dive in.
Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models
Bıyık, Erdem, Margoliash, Jonathan, Alimo, Shahrouz Ryan, Sadigh, Dorsa
Process (MDP) using Gaussian processes. In their work, they assumed the transition model is known and that there exists I. INTRODUCTION a predefined safety function. Both of these assumptions can Guaranteeing safety is a vital issue for many modern be quite restrictive when the system is going to operate in robotics systems, such as unmanned aerial vehicles (UAVs), unknown environments. In our work, we plan to address autonomous cars, or domestic robots [1], [2], [3]. One both of these challenges by considering unknown transition approach is to attempt to specify all potential scenarios models, and no access to a predefined safety function.