AITopics

We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the racetrack task and some grid-world tasks. For the bicycle and racetrack tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion.

agent, mapping, sutton, (15 more...)

Country:

North America > United States > Massachusetts (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Oyama, Eimei, Tachi, Susumu

Coordinate Transformation Learning of Hand Position Feedback Controller by Using Change of Position Error Norm

"goal-directed"; i.e., there is no direct way

controller, equation, feedback controller, (7 more...)

Country:

Asia > Middle East > Jordan (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Neuneier, Ralph, Mihatsch, Oliver

Risk Sensitive Reinforcement Learning

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3]. However, the linear manifold structure of PCA is not appropriate for data with binary valued variables.

algorithm, eigenvalue, generative model, (11 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Industry: Banking & Finance > Trading (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Munos, Rémi, Moore, Andrew W.

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement functions.

barycentric interpolator, convergence, equation, (14 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Moll, Robert, Barto, Andrew G., Perkins, Theodore J., Sutton, Richard S.

Learning Instance-Independent Value Functions to Enhance Local Search

Reinforcement learning methods can be used to improve the performance of local search algorithms for combinatorial optimization by learning an evaluation function that predicts the outcome of search. The evaluation function is therefore able to guide search to low-cost solutions better than can the original cost function. We describe a reinforcement learning method for enhancing local search that combines aspects of previous work by Zhang and Dietterich (1995) and Boyan and Moore (1997, Boyan 1998). In an off-line learning phase, a value function is learned that is useful for guiding search for multiple problem sizes and instances. We illustrate our technique by developing several such functions for the Dial-A-Ride Problem. Our learning-enhanced local search algorithm exhibits an improvement of more then 30% over a standard local search algorithm.

algorithm, local search, value function, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

The Effect of Eligibility Traces on Finding Optimal Memoryless Policies in Partially Observable Markov Decision Processes

Loch, John

Agents acting in the real world are confronted with the problem of making good decisions with limited knowledge of the environment. Partially observable Markov decision processes (POMDPs) model decision problems in which an agent tries to maximize its reward in the face of limited sensor feedback. Recent work has shown empirically that a reinforcement learning (RL) algorithm called Sarsa(A) can efficiently find optimal memoryless policies, which map current observations to actions, for POMDP problems (Loch and Singh 1998). The Sarsa(A) algorithm uses a form of short-term memory called an eligibility trace, which distributes temporally delayed rewards to observation-action pairs which lead up to the reward. This paper explores the effect of eligibility traces on the ability of the Sarsa(A) algorithm to find optimal memoryless policies.

eligibility trace, memoryless policy, sarsa, (12 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.05)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Exploring Unknown Environments with Real-Time Search or Reinforcement Learning

Koenig, Sven

Learning Real-Time A* (LRTA*) is a popular control method that interleaves planning and plan execution and has been shown to solve search problems in known environments efficiently. In this paper, we apply LRTA * to the problem of getting to a given goal location in an initially unknown environment. Uninformed LRTA * with maximal lookahead always moves on a shortest path to the closest unvisited state, that is, to the closest potential goal state. This was believed to be a good exploration heuristic, but we show that it does not minimize the worst-case plan-execution time compared to other uninformed exploration methods. This result is also of interest to reinforcement-learning researchers since many reinforcement learning methods use asynchronous dynamic programming, interleave planning and plan execution, and exhibit optimism in the face of uncertainty, just like LRTA *.

execution, goal state, plan-execution time, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kearns, Michael J., Singh, Satinder P.

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

In this paper, we address two issues of longstanding interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed.

algorithm, state-action pair, transition, (12 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hayashi, Akira, Suematsu, Nobuo

Viewing Classifier Systems as Model Free Learning in POMDPs

Classifier systems are now viewed disappointing because of their problems such as the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed a hybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.

learning, pomdp, viewing classifier system, (13 more...)

Country:

North America > United States > Michigan (0.07)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.97)

Brown, Timothy X., Tong, Hui, Singh, Satinder P.

Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning

This paper examines the application of reinforcement learning to a telecommunications networking problem. The problem requires that revenue be maximized while simultaneously meeting a quality of service constraint that forbids entry into certain states. We present a general solution to this multi-criteria problem that is able to earn significantly higher revenues than alternatives.

constraint, qos, revenue, (11 more...)