AITopics | Baxter, Jonathan

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

Greensmith, Evan, Bartlett, Peter L., Baxter, Jonathan

Neural Information Processing SystemsDec-31-2002

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths, to the variance of estimates based on iid data. We derive the baseline function that minimizes this variance, and we show that the variance for any baseline is the sum of the optimal variance and a weighted squared distance to the optimal baseline. We show that the widely used average discounted value baseline (where the reward is replaced by the difference between the reward and its expectation) is suboptimal.

artificial intelligence, reinforcement learning, value function, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

Greensmith, Evan, Bartlett, Peter L., Baxter, Jonathan

Neural Information Processing SystemsDec-31-2002

We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems.The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths,to the variance of estimates based on iid data. We derive the baseline function that minimizes this variance, and we show that the variance forany baseline is the sum of the optimal variance and a weighted squared distance to the optimal baseline. We show that the widely used average discounted value baseline (where the reward is replaced by the difference between the reward and its expectation) is suboptimal. The second approach we consider is the actor-critic method, which uses an approximate value function. We give bounds on the expected squared error of its estimates. We show that minimizing distance to the true value function is suboptimal in general; we provide an example for which the true value function gives an estimate with positive variance, but the optimal valuefunction gives an unbiased estimate with zero variance. Our bounds suggest algorithms to estimate the gradient of the performance of parameterized baseline or value functions.

artificial intelligence, reinforcement learning, value function, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

Boosting Algorithms as Gradient Descent

Mason, Llew, Baxter, Jonathan, Bartlett, Peter L., Frean, Marcus R.

Neural Information Processing SystemsDec-31-2000

Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers [1, 18]. Loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost function of the margins. That paper also described DOOM, an algorithm for directly minimizingthe margin cost function by adjusting the weights associated with Boosting Algorithms as Gradient Descent 513 each base classifier (the base classifiers are suppiled to DOOM). DOOM exhibits performance improvements over AdaBoost, even when using the same base hypotheses, whichprovides additional empirical evidence that these margin cost functions are appropriate quantities to optimize. In this paper, we present a general class of algorithms (called AnyBoost) which are gradient descent algorithms for choosing linear combinations of elements of an inner product function space so as to minimize some cost functional. The normal operation of a weak learner is shown to be equivalent to maximizing a certain inner product. We prove convergence of AnyBoost under weak conditions. In Section 3, we show that this general class of algorithms includes as special cases nearly all existing voting methods.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Queensland (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

Boosting Algorithms as Gradient Descent

Mason, Llew, Baxter, Jonathan, Bartlett, Peter L., Frean, Marcus R.

Neural Information Processing SystemsDec-31-2000

Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers [1, 18]. Loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost function of the margins.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Queensland (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

Add feedback

Direct Optimization of Margins Improves Generalization in Combined Classifiers

Mason, Llew, Bartlett, Peter L., Baxter, Jonathan

Neural Information Processing SystemsDec-31-1999

The dark curve is AdaBoost, the light curve is DOOM. DOOM sacrifices significant training error forimproved test error (horizontal markson margin 0 line)_ 1 Introduction Many learning algorithms for pattern classification minimize some cost function of the training data, with the aim of minimizing error (the probability of misclassifying an example). One example of such a cost function is simply the classifier's error on the training data.

artificial intelligence, cost function, health & medicine, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Direct Optimization of Margins Improves Generalization in Combined Classifiers

Mason, Llew, Bartlett, Peter L., Baxter, Jonathan

Neural Information Processing SystemsDec-31-1999

The dark curve is AdaBoost, the light curve is DOOM.

artificial intelligence, cost function, health & medicine, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

Baxter, Jonathan, Tridgell, Andrew, Weaver, Lex

arXiv.org Artificial IntelligenceJan-4-1999

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(lambda) and another less radical variant, TD-directed(lambda). In particular, our chess program, ``KnightCap,'' used TDLeaf(lambda) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). It improved from a 1650 rating to a 2100 rating in just 308 games. We discuss some of the reasons for this success and the relationship between our results and Tesauro's results in backgammon.

artificial intelligence, chess, tdleaf, (17 more...)

arXiv.org Artificial Intelligence

cs/9901001

Country:

Oceania > Australia (0.14)
North America > United States > California (0.14)
Asia (0.14)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)

Add feedback

The Canonical Distortion Measure in Feature Space and 1-NN Classification

Baxter, Jonathan, Bartlett, Peter L.

Neural Information Processing SystemsDec-31-1998

We prove that the Canonical Distortion Measure (CDM) [2, 3] is the optimal distance measure to use for I nearest-neighbour (l-NN) classification, andshow that it reduces to squared Euclidean distance in feature space for function classes that can be expressed as linear combinations of a fixed set of features.

artificial intelligence, classification, neural network, (16 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States > New York (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

The Canonical Distortion Measure in Feature Space and 1-NN Classification

Baxter, Jonathan, Bartlett, Peter L.

Neural Information Processing SystemsDec-31-1998

We prove that the Canonical Distortion Measure (CDM) [2, 3] is the optimal distance measure to use for I nearest-neighbour (l-NN) classification, and show that it reduces to squared Euclidean distance in feature space for function classes that can be expressed as linear combinations of a fixed set of features. PAClike bounds are given on the samplecomplexity required to learn the CDM. An experiment is presented in which a neural network CDM was learnt for a Japanese OCR environment and then used to do INN classification.

artificial intelligence, classification, neural network, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States > New York (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Learning Model Bias

Baxter, Jonathan

Neural Information Processing SystemsDec-31-1996

In this paper the problem of learning appropriate domain-specific bias is addressed. It is shown that this can be achieved by learning many related tasks from the same domain, and a theorem is given bounding the number tasks that must be learnt.

artificial intelligence, instructional theory, representation, (17 more...)

Neural Information Processing Systems

Country: