Co-Training and Expansion: Towards Bridging Theory and Practice

Balcan, Maria-florina, Blum, Avrim, Yang, Ke

Neural Information Processing Systems 

Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker "expansion" assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining tosucceed given appropriately strong PAClearning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-trainingalgorithm, unlike stronger assumptions (such as independence giventhe label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments onexpander graphs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found