McAllester, David A.
PAC Generalization Bounds for Co-training
Dasgupta, Sanjoy, Littman, Michael L., McAllester, David A.
The rule-based bootstrapping introduced by Y arowsky, and its co-training variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new P ACstyle bound on generalization error which justifies both the use of confidences -- partial rules and partial labeling of the unlabeled data -- and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of
PAC Generalization Bounds for Co-training
Dasgupta, Sanjoy, Littman, Michael L., McAllester, David A.
The rule-based bootstrapping introduced by Yarowsky, and its cotraining variantby Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new PACstyle bound on generalization error which justifies both the use of confidences -- partial rules and partial labeling of the unlabeled data -- and the use of an agreement-based objective function as suggested byCollins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of labels for
Boosting with Multi-Way Branching in Decision Trees
Mansour, Yishay, McAllester, David A.
It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification for a particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction of decision trees remains an effective boosting algorithm.
Boosting with Multi-Way Branching in Decision Trees
Mansour, Yishay, McAllester, David A.
It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, suchas CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification fora particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction ofdecision trees remains an effective boosting algorithm.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Sutton, Richard S., McAllester, David A., Singh, Satinder P., Mansour, Yishay
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Boosting with Multi-Way Branching in Decision Trees
Mansour, Yishay, McAllester, David A.
It is known that decision tree learning can be viewed as a form of boosting. However, existing boosting theorems for decision tree learning allow only binary-branching trees and the generalization to multi-branching trees is not immediate. Practical decision tree algorithms, such as CART and C4.5, implement a tradeoff between the number of branches and the improvement in tree quality as measured by an index function. Here we give a boosting justification for a particular quantitative tradeoff curve. Our main theorem states, in essence, that if we require an improvement proportional to the log of the number of branches then top-down greedy construction of decision trees remains an effective boosting algorithm.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Sutton, Richard S., McAllester, David A., Singh, Satinder P., Mansour, Yishay
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining apolicy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent ofthe value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.