Markov Models
Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction
We present a new approximate learning algorithm for Boltzmann Machines, using a systematic expansion of the Gibbs free energy to second order in the weights. The linear response correction to the correlations is given by the Hessian of the Gibbs free energy. The computational complexity of the algorithm is cubic in the number of neurons. We compare the performance of the exact BM learning algorithm with first order (Weiss) mean field theory and second order (TAP) mean field theory. The learning task consists of a fully connected Ising spin glass model on 10 neurons.
The Effect of Eligibility Traces on Finding Optimal Memoryless Policies in Partially Observable Markov Decision Processes
Agents acting in the real world are confronted with the problem of making good decisions with limited knowledge of the environment. Partially observable Markov decision processes (POMDPs) model decision problems in which an agent tries to maximize its reward in the face of limited sensor feedback. Recent work has shown empirically that a reinforcement learning (RL) algorithm called Sarsa(A) can efficiently find optimal memoryless policies, which map current observations to actions, for POMDP problems (Loch and Singh 1998). The Sarsa(A) algorithm uses a form of short-term memory called an eligibility trace, which distributes temporally delayed rewards to observation-action pairs which lead up to the reward. This paper explores the effect of eligibility traces on the ability of the Sarsa(A) algorithm to find optimal memoryless policies.
Markov Processes on Curves for Automatic Speech Recognition
We investigate a probabilistic framework for automatic speech recognition based on the intrinsic geometric properties of curves. In particular, we analyze the setting in which two variables-one continuous (), one discrete (s)-evolve jointly in time. We sup(cid:173) pose that the vector traces out a smooth multidimensional curve and that the variable s evolves stochastically as a function of the arc length traversed along this curve. Since arc length does not depend on the rate at which a curve is traversed, this gives rise to a family of Markov processes whose predictions, Pr[sl ]' are invariant to nonlinear warpings of time. We describe the use of such models, known as Markov processes on curves (MPCs), for automatic speech recognition, where are acoustic feature trajec(cid:173) tories and s are phonetic transcriptions.
Viewing Classifier Systems as Model Free Learning in POMDPs
Classifier systems are now viewed disappointing because of their prob(cid:173) lems such as the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have de(cid:173) veloped a hybrid classifier system: GLS (Generalization Learning Sys(cid:173) tem). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions.
An Entropic Estimator for Structure Discovery
We introduce a novel framework for simultaneous structure and parameter learning in hidden-variable conditional probability models, based on an en tropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: cross-entropy between model and data; entropy of the model; entropy of the data's descriptive statistics. Iterative estimation extinguishes weakly supported parameters, compressing and sparsifying the model. Trimming operators accelerate this process by removing excess parameters and, unlike most pruning schemes, guarantee an increase in posterior probability. Entropic estimation takes a overcomplete random model and simplifies it, inducing the structure of relations between hidden and observed variables.
Tractable Variational Structures for Approximating Graphical Models
Graphical models provide a broad probabilistic framework with ap(cid:173) plications in speech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational frame(cid:173) work for approximating these models, we present two classes of dis(cid:173) tributions, decimatable Boltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small bench(cid:173) mark problem suggest using these richer approximations compares favorably against others previously reported in the literature.
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.
Deep Dive: Modeling Customers Loan Default with Markov Chains
Sometimes machine learning is not the answer. Knowing the mechanisms of a system, we can construct models that answer certain quantitative questions more effectively. In this letter, we look at the customer loan payment process and model it with Markov Chains. This Deep Dive is part of the Data Science Fundamentals series. I once worked in a company that allowed to buy products with a credit.
Learning Nonlinear Dynamical Systems Using an EM Algorithm
The Expectation-Maximization (EM) algorithm is an iterative pro(cid:173) cedure for maximum likelihood parameter estimation from data sets with missing or hidden variables [2]. It has been applied to system identification in linear stochastic state-space models, where the state variables are hidden from the observer and both the state and the parameters of the model have to be estimated simulta(cid:173) neously [9]. We present a generalization of the EM algorithm for parameter estimation in nonlinear dynamical systems. The "expec(cid:173) tation" step makes use of Extended Kalman Smoothing to estimate the state, while the "maximization" step re-estimates the parame(cid:173) ters using these uncertain state estimates. In general, the nonlinear maximization step is difficult because it requires integrating out the uncertainty in the states.
An Environment Model for Nonstationary Reinforcement Learning
Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta(cid:173) tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more gen(cid:173) eral POMDP model unnecessarily increases the problem complex(cid:173) ity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.