AITopics

We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the racetrack task and some grid-world tasks. For the bicycle and racetrack tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion. 1 INTRODUCTION A macro-action is a sequence of actions chosen from the primitive actions of the problem.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: Europe > Denmark (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Exploring Unknown Environments with Real-Time Search or Reinforcement Learning

Koenig, Sven

Learning Real-Time A* (LRTA*) is a popular control method that interleaves planning andplan execution and has been shown to solve search problems in known environments efficiently. In this paper, we apply LRTA* to the problem of getting to a given goal location in an initially unknown environment. Uninformed LRTA* with maximal lookahead always moves on a shortest path to the closest unvisited state, that is, to the closest potential goal state. This was believed to be a good exploration heuristic, but we show that it does not minimize the worst-case plan-execution time compared to other uninformed exploration methods. This result is also of interest to reinforcement-learning researchers since many reinforcement learning methods use asynchronous dynamic programming, interleave planning and plan execution, and exhibit optimism in the face of uncertainty, just like LRTA*.

machine learning, plan-execution time, reinforcement learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kearns, Michael J., Singh, Satinder P.

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

In this paper, we address two issues of longstanding interest in the reinforcement learningliterature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number ofstate transitions observed.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Brown, Timothy X., Tong, Hui, Singh, Satinder P.

Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning

This paper examines the application of reinforcement learning to a telecommunications networking problem. The problem requires that revenue bemaximized while simultaneously meeting a quality of service constraint that forbids entry into certain states. We present a general solution to this multi-criteria problem that is able to earn significantly higher revenues than alternatives.

constraint, machine learning, reinforcement learning, (12 more...)

Country: North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre: Research Report (0.34)

Industry: Telecommunications (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

III, Leemon C. Baird, Moore, Andrew W.

Gradient Descent for General Reinforcement Learning

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms.These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch andvalue-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.95)

Wolpert, David, Tumer, Kagan, Frank, Jeremy

Using Collective Intelligence to Route Internet Traffic

A COllective INtelligence (COIN) is a set of interacting reinforcement learning(RL) algorithms designed in an automated fashion so that their collective behavior optimizes a global utility function. We summarize the theory of COINs, then present experiments using thattheory to design COINs to control internet traffic routing. These experiments indicate that COINs outperform all previously investigated RL-based, shortest path routing algorithms. 1 INTRODUCTION COllective INtelligences (COINs) are large, sparsely connected recurrent neural networks, whose "neurons" are reinforcement learning (RL) algorithms. The distinguishing featureof COINs is that their dynamics involves no centralized control, but only the collective effects of the individual neurons each modifying their behavior viatheir individual RL algorithms. This restriction holds even though the goal of the COIN concerns the system's global behavior.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country: North America > United States (0.49)

Industry:

Government > Space Agency (0.31)
Government > Regional Government > North America Government > United States Government (0.31)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Moody, John E., Saffell, Matthew

Reinforcement Learning for Trading

Inthis paper, we propose to use recurrent reinforcement learning to directly optimize such trading system performance functions, and we compare two different reinforcementlearning methods. The first, Recurrent Reinforcement Learning, uses immediate rewards to train the trading systems, while the second (Q-Learning (Watkins 1989)) approximates discounted future rewards. These methodologies can be applied to optimizing systems designed to trade a single security or to trade portfolios .In addition, we propose a novel value function for risk-adjusted return that enables learning to be done online: the differential Sharpe ratio. Trading system profits depend upon sequences of interdependent decisions, and are thus path-dependent. Optimal trading decisions when the effects of transactions costs, market impact and taxes are included require knowledge of the current system state. In Moody, Wu, Liao & Saffell (1998), we demonstrate that reinforcement learning provides a more elegant and effective means for training trading systems when transaction costs are included, than do more standard supervised approaches.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: North America > United States > Oregon (0.14)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

McGovern, Amy, Moss, J. Eliot B.

Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts

In 1986, Tanner and Mead [1] implemented an interesting constraint satisfaction circuitfor global motion sensing in aVLSI. We report here a new and improved aVLSI implementation that provides smooth optical flow as well as global motion in a two dimensional visual field. The computation ofoptical flow is an ill-posed problem, which expresses itself as the aperture problem. However, the optical flow can be estimated by the use of regularization methods, in which additional constraints are introduced interms of a global energy functional that must be minimized. We show how the algorithmic constraints of Hom and Schunck [2] on computing smoothoptical flow can be mapped onto the physical constraints of an equivalent electronic network.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)

Moriarty, D. E., Schultz, A. C., Grefenstette, J. J.

Evolutionary Algorithms for Reinforcement Learning

Journal of Artificial Intelligence ResearchSep-1-1999

There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications.

evolutionary algorithm, reinforcement learning

Journal of Artificial Intelligence Research

doi: 10.1613/jair.613

AI Access Foundation

10240

Journal of Artificial Intelligence Research

Genre: Overview (0.53)

Industry: Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)