Sparse Local Embeddings for Extreme Multi-label Classification

Bhatia, Kush, Jain, Himanshu, Kar, Purushottam, Varma, Manik, Jain, Prateek

Dec-31-2015–Neural Information Processing Systems

The objective in extreme multi-label learning is to train a classifier that can automatically taga novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches attempt to make training and prediction tractable by assuming that the training label matrix is low-rank and reducing the effective number of labels by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches havebeen unable to deliver high prediction accuracies, or scale to large problems as the low rank assumption is violated in most real world applications. In this paper we develop the SLEEC classifier to address both limitations. The main technical contribution in SLEEC is a formulation for learning a small ensemble oflocal distance preserving embeddings which can accurately predict infrequently occurring(tail) labels. This allows SLEEC to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world, as well as benchmark datasets and compared our method against state-of-the-art methods for extreme multi-labelclassification. Experiments reveal that SLEEC can make significantly moreaccurate predictions then the state-of-the-art methods including both embedding-based (by as much as 35%) as well as tree-based (by as much as 6%) methods. SLEEC can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods.

artificial intelligence, machine learning, sleec, (17 more...)

Neural Information Processing Systems

Dec-31-2015

Conferences PDF

Add feedback

Country:
- Asia > India (0.28)

Genre:
- Research Report > Promising Solution (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Performance Analysis > Accuracy (0.34)

Duplicate Docs Excel Report

Title
Sparse Local Embeddings for Extreme Multi-label Classification
35051070e572e47d2c26c241ab88307f-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found