Europe
Learning Local Error Bars for Nonlinear Regression
Nix, David A., Weigend, Andreas S.
We present a new method for obtaining local error bars for nonlinear regression, i.e., estimates of the confidence in predicted values that depend on the input. We approach this problem by applying a maximumlikelihood framework to an assumed distribution of errors. We demonstrate our method first on computer-generated data with locally varying, normally distributed target noise. We then apply it to laser data from the Santa Fe Time Series Competition where the underlying system noise is known quantization error and the error bars give local estimates of model misspecification. In both cases, the method also provides a weightedregression effect that improves generalization performance.
Multidimensional Scaling and Data Clustering
Hofmann, Thomas, Buhmann, Joachim
Visualizing and structuring pairwise dissimilarity data are difficult combinatorial optimization problems known as multidimensional scaling or pairwise data clustering. Algorithms for embedding dissimilarity data set in a Euclidian space, for clustering these data and for actively selecting data to support the clustering process are discussed in the maximum entropy framework. Active data selection provides a strategy to discover structure in a data set efficiently with partially unknown data. 1 Introduction Grouping experimental data into compact clusters arises as a data analysis problem in psychology, linguistics, genetics and other experimental sciences. The data which are supposed to be clustered are either given by an explicit coordinate representation (central clustering) or, in the non-metric case, they are characterized by dissimilarity values for pairs of data points (pairwise clustering). In this paper we study algorithms (i) for embedding non-metric data in a D-dimensional Euclidian space, (ii) for simultaneous clustering and embedding of non-metric data, and (iii) for active data selection to determine a particular cluster structure with minimal number of data queries. All algorithms are derived from the maximum entropy principle (Hertz et al., 1991) which guarantees robust statistics (Tikochinsky et al., 1984).
An Input Output HMM Architecture
Bengio, Yoshua, Frasconi, Paolo
We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation. 1 INTRODUCTION Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 Lawrence K. Saul, Michael I. Jordan
An Actor/Critic Algorithm that is Equivalent to Q-Learning
Crites, Robert H., Barto, Andrew G.
We prove the convergence of an actor/critic algorithm that is equivalent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the actor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using criteria that depend on the relative probability of the action that was executed.
Reinforcement Learning Methods for Continuous-Time Markov Decision Problems
Bradtke, Steven J., Duff, Michael O.
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied.
Instance-Based State Identification for Reinforcement Learning
This paper presents instance-based state identification, an approach to reinforcement learning and hidden state that builds disambiguating amounts of short-term memory online, and also learns with an order of magnitude fewer training steps than several previous approaches. Inspired by a key similarity between learning with hidden state and learning in continuous geometrical spaces, this approach uses instance-based (or "memory-based") learning, a method that has worked well in continuous spaces. 1 BACKGROUND AND RELATED WORK When a robot's next course of action depends on information that is hidden from the sensors because of problems such as occlusion, restricted range, bounded field of view and limited attention, the robot suffers from hidden state. More formally, we say a reinforcement learning agent suffers from the hidden state problem if the agent's state representation is non-Markovian with respect to actions and utility. The hidden state problem arises as a case of perceptual aliasing: the mapping between states of the world and sensations of the agent is not one-to-one [Whitehead, 1992]. If the agent's perceptual system produces the same outputs for two world states in which different actions are required, and if the agent's state representation consists only of its percepts, then the agent will fail to choose correct actions.
Advantage Updating Applied to a Differential Game
Harmon, Mance E., III, Leemon C. Baird, Klopf, A. Harry
An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. The reinforcement learning algorithm for optimal control is modified for differential games in order to find the minimax point, rather than the maximum. Simulation results are compared to the optimal solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual gradient and non-residual gradient forms of advantage updating and Q-learning are compared. The results show that advantage updating converges faster than Q-learning in all simulations.
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
Jaakkola, Tommi, Singh, Satinder P., Jordan, Michael I.
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable.
A Rigorous Analysis of Linsker-type Hebbian Learning
Feng, J., Pan, H., Roychowdhury, V. P.
We propose a novel rigorous approach for the analysis of Linsker's unsupervised Hebbian learning network. The behavior of this model is determined by the underlying nonlinear dynamics which are parameterized by a set of parameters originating from the Hebbian rule and the arbor density of the synapses. These parameters determine the presence or absence of a specific receptive field (also referred to as a'connection pattern') as a saturated fixed point attractor of the model. In this paper, we perform a qualitative analysis of the underlying nonlinear dynamics over the parameter space, determine the effects of the system parameters on the emergence of various receptive fields, and predict precisely within which parameter regime the network will have the potential to develop a specially designated connection pattern. In particular, this approach exposes, for the first time, the crucial role played by the synaptic density functions, and provides a complete precise picture of the parameter space that defines the relationships among the different receptive fields. Our theoretical predictions are confirmed by numerical simulations.