Markov Models
State Duration and Interval Modeling in Hidden Semi-Markov Model for Sequential Data Analysis
Narimatsu, Hiromi, Kasai, Hiroyuki
Sequential data modeling and analysis have become indispensable tools for analyzing sequential data, such as time-series data, because larger amounts of sensed event data have become available. These methods capture the sequential structure of data of interest, such as input-output relations and correlation among datasets. However, because most studies in this area are specialized or limited to their respective applications, rigorous requirement analysis of such models has not been undertaken from a general perspective. Therefore, we particularly examine the structure of sequential data, and extract the necessity of `state duration' and `state interval' of events for efficient and rich representation of sequential data. Specifically addressing the hidden semi-Markov model (HSMM) that represents such state duration inside a model, we attempt to add representational capability of a state interval of events onto HSMM. To this end, we propose two extended models: an interval state hidden semi-Markov model (IS-HSMM) to express the length of a state interval with a special state node designated as "interval state node"; and an interval length probability hidden semi-Markov model (ILP-HSMM) which represents the length of the state interval with a new probabilistic parameter "interval length probability." Exhaustive simulations have revealed superior performance of the proposed models in comparison with HSMM. These proposed models are the first reported extensions of HMM to support state interval representation as well as state duration representation.
Sample-Optimal Parametric Q-Learning with Linear Transition Models
Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal policy using a sample size proportional to the feature dimension $K$ and invariant with respect to the size of the state space. To further improve its sample efficiency, we exploit the monotonicity property and intrinsic noise structure of the Bellman operator, provided the existence of anchor state-actions that imply implicit non-negativity in the feature space. We augment the algorithm using techniques of variance reduction, monotonicity preservation, and confidence bounds. It is proved to find a policy which is $\epsilon$-optimal from any initial state with high probability using $\widetilde{O}(K/\epsilon^2(1-\gamma)^3)$ sample transitions for arbitrarily large-scale MDP with a discount factor $\gamma\in(0,1)$. A matching information-theoretical lower bound is proved, confirming the sample optimality of the proposed method with respect to all parameters (up to polylog factors).
How AI could help you learn sign language
Sign languages aren't easy to learn and are even harder to teach. They use not just hand gestures but also mouthings, facial expressions and body posture to communicate meaning. This complexity means professional teaching programs are still rare and often expensive. But this could all change soon, with a little help from artificial intelligence (AI). My colleagues and I are working on software for teaching yourself sign languages in an automated, intuitive way.
Bayesian Online Detection and Prediction of Change Points
Agudelo-Espaรฑa, Diego, Gomez-Gonzalez, Sebastian, Bauer, Stefan, Schรถlkopf, Bernhard, Peters, Jan
Online detection of instantaneous changes in the generative process of a data sequence generally focuses on retrospective inference of such change points without considering their future occurrences. We extend the Bayesian Online Change Point Detection algorithm to also infer the number of time steps until the next change point (i.e., the residual time). This enables us to handle observation models which depend on the total segment duration, which is useful to model data sequences with temporal scaling. In addition, we extend the model by removing the i.i.d. assumption on the observation model parameters. The resulting inference algorithm for segment detection can be deployed in an online fashion, and we illustrate applications to synthetic and to two medical real-world data sets.
Emergence of Hierarchy via Reinforcement Learning Using a Multiple Timescale Stochastic RNN
Han, Dongqi, Doya, Kenji, Tani, Jun
Although recurrent neural networks (RNNs) for reinforcement learning (RL) have addressed unique advantages in various aspects, e. g., solving memory-dependent tasks and meta-learning, very few studies have demonstrated how RNNs can solve the problem of hierarchical RL by autonomously developing hierarchical control. In this paper, we propose a novel model-free RL framework called ReMASTER, which combines an off-policy actor-critic algorithm with a multiple timescale stochastic recurrent neural network for solving memory-dependent and hierarchical tasks. We performed experiments using a challenging continuous control task and showed that: (1) Internal representation necessary for achieving hierarchical control autonomously develops through exploratory learning. (2) Stochastic neurons in RNNs enable faster relearning when adapting to a new task which is a recomposition of sub-goals previously learned.
WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving
Lee, Jaeyoung, Balakrishnan, Aravind, Gaurav, Ashish, Czarnecki, Krzysztof, Sedwards, Sean
Machine learning can provide efficient solutions to the complex problems encountered in autonomous driving, but ensuring their safety remains a challenge. A number of authors have attempted to address this issue, but there are few publicly-available tools to adequately explore the trade-offs between functionality, scalability, and safety. We thus present WiseMove, a software framework to investigate safe deep reinforcement learning in the context of motion planning for autonomous driving. WiseMove adopts a modular learning architecture that suits our current research questions and can be adapted to new technologies and new questions. We present the details of WiseMove, demonstrate its use on a common traffic scenario, and describe how we use it in our ongoing safe learning research.
Divergence-Based Motivation for Online EM and Combining Hidden Variable Models
Amid, Ehsan, Warmuth, Manfred K.
Expectation-Maximization (EM) is the fallback method for parameter estimation of hidden (aka latent) variable models. Given the full batch of data, EM forms an upper-bound of the negative log-likelihood of the model at each iteration and then updates to the minimizer of this upper-bound. We introduce a versatile online variant of EM where the data arrives in as a stream. Our motivation is based on the relative entropy divergences between two joint distributions over the hidden and visible variables. We view the EM upper-bound as a Monte Carlo approximation of an expectation and show that the joint relative entropy divergence induces a similar expectation form. As a result, we employ the divergence to the old model as the inertia term to motivate our online EM algorithm. Our motivation is more widely applicable than previous ones and leads to simple online updates for mixture of exponential distributions, hidden Markov models, and the first known online update for Kalman filters. Additionally, the finite sample form of the inertia term lets us derive online updates when there is no closed form solution. Experimentally, sweeping the data with an online update converges much faster than the batch update. Our divergence based methods also lead to a simple way to combine hidden variable models and this immediately gives efficient algorithms for distributed setting.
Latent Space Reinforcement Learning for Steering Angle Prediction
Khan, Qadeer, Schรถn, Torsten, Wenzel, Patrick
Abstract--Model-free reinforcement learning has recently been shown to successfully learn navigation policies from raw sensor data. In this work, we address the problem of learning driving policies for an autonomous agent in a high-fidelity simulator. Building upon recent research that applies deep reinforcement learning to navigation problems, we present a modular deep reinforcement learning approach to predict the steering angle of the car from raw images. The control module trained with reinforcement learning takes the latent vector as input to predict the correct steering angle. The experimental results have showed that our method is capable of learning to maneuver the car without any human control signals. I. INTRODUCTION Reinforcement learning (RL) is gaining interest as a promising avenueto training end-to-end autonomous driving policies. These algorithms have recently been shown to solve complex tasks such as navigation from raw vision-sensor modalities. However, training those algorithms require vast amounts of data and interactions with the environment to cover a wide variety of driving scenarios. The collection of such data if even possible is costly and time-consuming.
Tier-I Indian Institutes Offering Analytics Courses To Bridge AI Talent Gap
In the changing tech scenario in India, noted and well-established institutes have now also started to step forward and train students as well as the professionals in artificial intelligence and machine learning. The institutes are providing both the current needs of algorithms and mathematical insights as well as practical experiences. In this article, we list 5 tier-1 institutes that have added courses on artificial intelligence in India. About The Programme: This institute launched a dual degree specialisation in data science as well as in robotics in the year 2018. Any B.Tech student can enroll in this programme based on the CGPA cut-off of 8.0 at the end of the 5th semester.
Model-Based Detector for SSDs in the Presence of Inter-cell Interference
Yassine, Hachem, Badiu, Mihai-Alin, Coon, Justin
In this paper, we consider the problem of reducing the bit error rate of flash-based solid state drives (SSDs) when cells are subject to inter-cell interference (ICI). By observing that the outputs of adjacent victim cells can be correlated due to common aggressors, we propose a novel channel model to accurately represent the true flash channel. This model, equivalent to a finite-state Markov channel model, allows the use of the sum-product algorithm to calculate more accurate posterior distributions of individual cell inputs given the joint outputs of victim cells. These posteriors can be easily mapped to the log-likelihood ratios that are passed as inputs to the soft LDPC decoder. When the output is available with high precision, our simulation showed that a significant reduction in the bit-error rate can be obtained, reaching $99.99\%$ reduction compared to current methods, when the diagonal coupling is very strong. In the realistic case of low-precision output, our scheme provides less impressive improvements due to information loss in the process of quantization. To improve the performance of the new detector in the quantized case, we propose a new iterative scheme that alternates multiple times between the detector and the decoder. Our simulations showed that the iterative scheme can significantly improve the bit error rate even in the quantized case.