Asia
Temporally Asymmetric Hebbian Learning, Spike liming and Neural Response Variability
Recent experimental data indicate that the strengthening or weakening of synaptic connections between neurons depends on the relative timing of pre-and postsynaptic action potentials. A Hebbian synaptic modification rule based on these data leads to a stable state in which the excitatory and inhibitory inputs to a neuron are balanced, producing an irregular pattern of firing. It has been proposed that neurons in vivo operate in such a mode.
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Williams, John K., Singh, Satinder P.
Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
Suematsu, Nobuo, Hayashi, Akira
We have proved that the model learned by BLHT converges to the optimal model in given hypothesis space, 1{, which provides the most accurate predictions of percepts and rewards, given short-term memory. We believe this fact provides a solid basis for BLHT, and BLHT can be compared favorably with other methods using short-term memory.
Reinforcement Learning Based on On-Line EM Algorithm
On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications. In this article, we propose a new RL method that has the above-mentioned two features. This method is based on an actor-critic architecture (Barto et al., 1983), although the detailed implementations of the actor and the critic are quite differ- Reinforcement Learning Based on On-Line EM Algorithm 1053 ent from those in the original actor-critic model. The actor and the critic in our method estimate a policy and a Q-function, respectively, and are approximated by Normalized Gaussian Networks (NGnet) (l'doody & Darken, 1989).
Risk Sensitive Reinforcement Learning
Neuneier, Ralph, Mihatsch, Oliver
A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3]. However, the linear manifold structure of PCA is not appropriate for data with binary valued variables.
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
Kearns, Michael J., Singh, Satinder P.
In this paper, we address two issues of longstanding interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed.
Viewing Classifier Systems as Model Free Learning in POMDPs
Hayashi, Akira, Suematsu, Nobuo
Classifier systems are now viewed disappointing because of their problems such as the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed a hybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.
Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning
Brown, Timothy X., Tong, Hui, Singh, Satinder P.
This paper examines the application of reinforcement learning to a telecommunications networking problem. The problem requires that revenue be maximized while simultaneously meeting a quality of service constraint that forbids entry into certain states. We present a general solution to this multi-criteria problem that is able to earn significantly higher revenues than alternatives.
Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm
Al-Ansari, Mohammad A., Williams, Ronald J.
Parti-game (Moore 1994a; Moore 1994b; Moore and Atkeson 1995) is a reinforcement learning (RL) algorithm that has a lot of promise in overcoming the curse of dimensionality that can plague RL algorithms when applied to high-dimensional problems. In this paper we introduce modifications to the algorithm that further improve its performance and robustness. In addition, while parti-game solutions can be improved locally by standard local path-improvement techniques, we introduce an add-on algorithm in the same spirit as parti-game that instead tries to improve solutions in a non-local manner. 1 INTRODUCTION Parti-game operates on goal problems by dynamically partitioning the space into hyperrectangular cells of varying sizes, represented using a k-d tree data structure. It assumes the existence of a pre-specified local controller that can be commanded to proceed from the current state to a given state. The algorithm uses a game-theoretic approach to assign costs to cells based on past experiences using a minimax algorithm.