Undirected Networks
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Williams, John K., Singh, Satinder P.
Partially Observable Markov Decision Processes (pO"MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.
Viewing Classifier Systems as Model Free Learning in POMDPs
Hayashi, Akira, Suematsu, Nobuo
Classifier systems are now viewed disappointing because of their problems suchas the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed ahybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.
Gradient Descent for General Reinforcement Learning
III, Leemon C. Baird, Moore, Andrew W.
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms.These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch andvalue-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.
Graphical Models for Recognizing Human Interactions
Oliver, Nuria, Rosario, Barbara, Pentland, Alex
We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in two different scenarios: (1) complex, twohanded actionrecognition in the martial art of Tai Chi and (2) detection and recognition of individual human behaviors and multiple-person interactions in a visual surveillance task. In the latter case, the system is particularly concerned with detecting when interactions between people occur, and classifying them. Graphical models, such as Hidden Markov Models (HMMs) [6] and Coupled Hidden MarkovModels (CHMMs) [3, 2], seem appropriate for modeling and, classifying human behaviors because they offer dynamic time warping, a well-understood training algorithm, and a clear Bayesian semantics for both individual (HMMs) and interacting or coupled (CHMMs) generative processes. A major problem with this data-driven statistical approach, especially when modeling rare or anomalous behaviors, is the limited number of training examples. A major emphasis of our work, therefore, is on efficient Bayesian integration of both prior knowledge with evidence from data.
Markov Processes on Curves for Automatic Speech Recognition
Saul, Lawrence K., Rahim, Mazin G.
To formulate a probabilistic model of this process, we consider two variables-one continuous (x), one discrete (s)-that evolve jointly in time. Thus the vector x traces out a smooth multidimensional curve, to each point of which the variable s attaches a discrete label. Markov processes on curves are based on the concept of arc length. After reviewing how to compute arc lengths along curves, we introduce a family of Markov processes whose predictions are invariant to nonlinear warpings of time. We then consider the ways in which these processes (and various generalizations) differ from HMMs. Markov Processes on Curves for Automatic Speech Recognition 753 2.1 Arc length Let g() define a D x D matrix-valued function over x E RP. If g() is everywhere nonnegative definite, then we can use it as a metric to compute distances along curves.
Maximum-Likelihood Continuity Mapping (MALCOM): An Alternative to HMMs
Nix, David A., Hogden, John E.
We describe Maximum-Likelihood Continuity Mapping (MALCOM), an alternative to hidden Markov models (HMMs) for processing sequence data such as speech. While HMMs have a discrete "hidden" space constrained bya fixed finite-automaton architecture, MALCOM has a continuous hidden space-a continuity map-that is constrained only by a smoothness requirement on paths through the space. MALCOM fits into the same probabilistic framework for speech recognition as HMMs, but it represents a more realistic model of the speech production process. To evaluate the extent to which MALCOM captures speech production information, we generated continuous speech continuity maps for three speakers and used the paths through them to predict measured speech articulator data. The median correlation between the MALCOM paths obtained from only the speech acoustics and articulator measurements was 0.77 on an independent test set not used to train MALCOM or the predictor.
Controlling the Complexity of HMM Systems by Regularization
Neukirchen, Christoph, Rigoll, Gerhard
This paper introduces a method for regularization ofHMM systems that avoids parameter overfitting caused by insufficient training data. Regularization isdone by augmenting the EM training method by a penalty term that favors simple and smooth HMM systems. The penalty term is constructed as a mixture model of negative exponential distributions that is assumed to generate the state dependent emission probabilities of the HMMs. This new method is the successful transfer of a well known regularization approach in neural networks to the HMM domain and can be interpreted as a generalization of traditional state-tying for HMM systems. Theeffect of regularization is demonstrated for continuous speech recognition tasks by improving overfitted triphone models and by speaker adaptation with limited training data. 1 Introduction One general problem when constructing statistical pattern recognition systems is to ensure the capability to generalize well, i.e. the system must be able to classify data that is not contained in the training data set.
Learning Nonlinear Dynamical Systems Using an EM Algorithm
Ghahramani, Zoubin, Roweis, Sam T.
The Expectation-Maximization (EM) algorithm is an iterative procedure formaximum likelihood parameter estimation from data sets with missing or hidden variables [2]. It has been applied to system identification in linear stochastic state-space models, where the state variables are hidden from the observer and both the state and the parameters of the model have to be estimated simultaneously [9].We present a generalization of the EM algorithm for parameter estimation in nonlinear dynamical systems. The "expectation" stepmakes use of Extended Kalman Smoothing to estimate the state, while the "maximization" step re-estimates the parameters usingthese uncertain state estimates. In general, the nonlinear maximization step is difficult because it requires integrating out the uncertainty in the states. However, if Gaussian radial basis function (RBF)approximators are used to model the nonlinearities, the integrals become tractable and the maximization step can be solved via systems of linear equations.