AITopics

Probability models can be used to predict outcomes and compensate for missing data, but even a perfect model cannot be used to make decisions unless the utility of the outcomes, or preferences between them, are also provided. This arises in many real-world problems, such as medical diagnosis, wherethe cost of the test as well as the expected improvement in the outcome must be considered. Relatively little work has been done on learning the utilities of outcomes for optimal decision making. In this paper, we show how temporal-difference reinforcement learning (TO(A» can be used to determine decision theoretic utilities within the context of a mixture model and apply this new approach to a problem in medical diagnosis. TO(A) learning of utilities reduces the number of tests that have to be done to achieve the same level of performance compared with the probability model alone, which results in significant cost savings and increased efficiency.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Industry:

Health & Medicine > Diagnostic Medicine (0.87)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Pollack, Jordan B., Blair, Alan D.

Why did TD-Gammon Work?

Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference We werelearning for other applications or even other games. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest of Tesauro's program had more to do with thethat the surprising success of the learning task and the dynamics of theco-evolutionary structure backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal of Backgammon (Tesauro, 1992). After all, the dream ofprogram play itself in the hopes computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961). However such self-conditioning or nonexistent internal representations, had generally beensystems, with weak of scale and abandoned by the field of AI.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Industry: Leisure & Entertainment > Games > Backgammon (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games > Backgammon (1.00)

Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Pareigis, Stephan

The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton Jacobi-Bellman (HJB-) type. Numerical analysis provides multigrid methodsfor this kind of equation. In the case of Learning Control, however,the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward thetype of time and space discretization during the observation. Analgorithm for multi-grid observation is proposed.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Singh, Satinder P., Dayan, Peter

Analytical Mean Squared Error Curves in Temporal Difference Learning

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its stepsize andeligibility trace parameters. 1 INTRODUCTION A reassuring theory of asymptotic convergence is available for many reinforcement learning (RL) algorithms. What is not available, however, is a theory that explains the finite-term learning curve behavior of RL algorithms, e.g., what are the different kinds of learning curves, what are their key determinants, and how do different problem parameters effect rate of convergence. Answering these questions is crucial not only for making useful comparisons between algorithms, but also for developing hybrid and new RL methods. In this paper we provide preliminary answers to some of the above questions for the case of absorbing Markov chains, where mean square error between the estimated and true predictions is used as the quantity of interest in learning curves.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Tsitsiklis, John N., Roy, Benjamin Van

Approximate Solutions to Optimal Stopping Problems

We propose and analyze an algorithm that approximates solutions to the problem of optimal stopping in a discounted irreducible aperiodic Markovchain. The scheme involves the use of linear combinations offixed basis functions to approximate a Q-function. The weights of the linear combination are incrementally updated through an iterative process similar to Q-Iearning, involving simulation ofthe underlying Markov chain. Due to space limitations, we only provide an overview of a proof of convergence (with probability 1)and bounds on the approximation error. This is the first theoretical result that establishes the soundness of a Q-Iearninglike algorithmwhen combined with arbitrary linear function approximators tosolve a sequential decision problem.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.36)

Tsitsiklis, John N., Roy, Benjamin Van

Analysis of Temporal-Diffference Learning with Function Approximation

The algorithm weanalyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control. Anexample in time-series prediction is that of estimating the net present value of a corporation, as a discounted sum of its future cash flows, based on the current state of its operations. In control, the ability to predict long-term future cost as a function of state enables the ranking of alternative states in order to guide decision-making. Indeed, such predictions constitute the cost-to-go function that is central to dynamic programming and optimal control (Bertsekas, 1995). Temporal-difference learning, originally proposed by Sutton (1988), is a method for approximating long-term future cost as a function of current state.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.39)

Tesauro, Gerald, Galperin, Gregory R.

On-line Policy Improvement using Monte-Carlo Search

Policy iteration is known to have rapid and robust convergence properties, and for Markov tasks with lookup-table state-space representations, it is guaranteed to convergence to the optimal policy. Online Policy Improvement using Monte-Carlo Search 1069 In typical uses of policy iteration, the policy improvement step is an extensive off-line procedure. For example, in dynamic programming, one performs a sweep through all states in the state space. Reinforcement learning provides another approach topolicy improvement; recently, several authors have investigated using RL in conjunction with nonlinear function approximators to represent the value functions and/orpolicies (Tesauro, 1992; Crites and Barto, 1996; Zhang and Dietterich, 1996). These studies are based on following actual state-space trajectories rather than sweeps through the full state space, but are still too slow to compute improved policies in real time.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Industry: Leisure & Entertainment > Games > Backgammon (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning

Schneider, Jeff G.

Model learning combined with dynamic programming has been shown to be effective for learning control of continuous state dynamic systems. The simplest method assumes the learned model is correct and applies dynamic programming to it, but many approximators provide uncertainty estimates on the fit. How can they be exploited? This paper addresses the case where the system must be prevented from having catastrophic failures during learning.We propose a new algorithm adapted from the dual control literature and use Bayesian locally weighted regression models with dynamic programming.A common reinforcement learning assumption is that aggressive exploration should be encouraged. This paper addresses the converse casein which the system has to reign in exploration.

controller, machine learning, reinforcement learning, (16 more...)

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Learning from Demonstration

Schaal, Stefan

By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking.

demonstration, machine learning, reinforcement learning, (15 more...)

Country:

North America > United States (0.28)
Europe (0.28)
Asia > Japan (0.28)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Hansen, Eric A., Barto, Andrew G., Zilberstein, Shlomo

Reinforcement Learning for Mixed Open-loop and Closed-loop Control

Closed-loop control relies on sensory feedback that is usually assumed tobe free . But if sensing incurs a cost, it may be costeffective totake sequences of actions in open-loop mode. We describe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Although weassume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country:

North America > United States > Massachusetts (0.28)
Europe > United Kingdom > England (0.28)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)