Co-Training and Expansion: Towards Bridging Theory and Practice

Balcan, Maria-florina, Blum, Avrim, Yang, Ke

Dec-31-2005–Neural Information Processing Systems

Co-training is a method for combining labeled and unlabeled data when examples can be thought of as containing two distinct sets of features. It has had a number of practical successes, yet previous theoretical analyses have needed very strong assumptions on the data that are unlikely to be satisfied in practice. In this paper, we propose a much weaker "expansion" assumption on the underlying data distribution, that we prove is sufficient for iterative cotraining tosucceed given appropriately strong PAClearning algorithms on each feature set, and that to some extent is necessary as well. This expansion assumption in fact motivates the iterative nature of the original co-trainingalgorithm, unlike stronger assumptions (such as independence giventhe label) that allow a simpler one-shot co-training to succeed. We also heuristically analyze the effect on performance of noise in the data. Predicted behavior is qualitatively matched in synthetic experiments onexpander graphs.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Dec-31-2005

Conferences PDF

Add feedback

Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Unsupervised or Indirectly Supervised Learning (0.35)

Duplicate Docs Excel Report

Title
Co-Training and Expansion: Towards Bridging Theory and Practice
Co-Training and Expansion: Towards Bridging Theory and Practice

Similar Docs Excel Report more

Title	Similarity	Source
None found