Goto

Collaborating Authors

 nullnullnull and nullnullnull


PAC Generalization Bounds for Co-training

Dasgupta, Sanjoy, Littman, Michael L., McAllester, David A.

Neural Information Processing Systems

In this paper, we study bootstrapping algorithms for learning from unlabeled data. The general idea in bootstrapping is to use some initial labeled data to build a (possibly partial) predictive labeling procedure; then use the labeling procedure to label more data; then use the newly labeled data to build a new predictive procedure and so on. This process can be iterated until a fixed point is reached or some other stopping criterion is met. Here we give P AC style bounds on generalization error which can be used to formally justify certain boostrapping algorithms. One well-known form of bootstrapping is the EM algorithm (Dempster, Laird and Rubin, 1977).


PAC Generalization Bounds for Co-training

Dasgupta, Sanjoy, Littman, Michael L., McAllester, David A.

Neural Information Processing Systems

The rule-based bootstrapping introduced by Y arowsky, and its co-training variant by Blum and Mitchell, have met with considerable empirical success. Earlier work on the theory of co-training has been only loosely related to empirically useful co-training algorithms. Here we give a new P ACstyle bound on generalization error which justifies both the use of confidences -- partial rules and partial labeling of the unlabeled data -- and the use of an agreement-based objective function as suggested by Collins and Singer. Our bounds apply to the multiclass case, i.e., where instances are to be assigned one of