A no-regret generalization of hierarchical softmax to extreme multi-label classification
Wydmuch, Marek, Jasinska, Kalina, Kuznetsov, Mikhail, Busa-Fekete, Róbert, Dembczynski, Krzysztof
–Neural Information Processing Systems
Extreme multi-label classification (XMLC) is a problem of tagging an instance with a small subset of relevant labels chosen from an extremely large pool of possible labels. Large label spaces can be efficiently handled by organizing labels as a tree, like in the hierarchical softmax (HSM) approach commonly used for multi-class problems. In this paper, we investigate probabilistic label trees (PLTs) that have been recently devised for tackling XMLC problems. We show that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric. Critically, we prove that pick-one-label heuristic---a reduction technique from multi-label to multi-class that is routinely used along with HSM---is not consistent in general. We also show that our implementation of PLTs, referred to as extremeText (XT), obtains significantly better results than HSM with the pick-one-label heuristic and XML-CNN, a deep network specifically designed for XMLC problems. Moreover, XT is competitive to many state-of-the-art approaches in terms of statistical performance, model size and prediction time which makes it amenable to deploy in an online system.
Neural Information Processing Systems
Dec-31-2018
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
- Middle East > Israel
- Haifa District > Haifa (0.04)
- China > Beijing
- Europe
- France
- Auvergne-Rhône-Alpes > Lyon
- Lyon (0.04)
- Occitanie > Hérault
- Montpellier (0.04)
- Auvergne-Rhône-Alpes > Lyon
- Italy > Sardinia (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Poland > Greater Poland Province
- Poznań (0.05)
- Portugal > Porto
- Porto (0.04)
- Spain > Andalusia
- Granada Province > Granada (0.04)
- United Kingdom
- England > Cambridgeshire
- Cambridge (0.04)
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England > Cambridgeshire
- France
- North America
- Barbados > Saint Michael
- Bridgetown (0.04)
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Nova Scotia > Halifax Regional Municipality
- Halifax (0.04)
- Quebec > Montreal (0.05)
- British Columbia > Metro Vancouver Regional District
- United States
- Florida > Broward County
- Fort Lauderdale (0.04)
- Nevada (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.14)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- Florida > Broward County
- Barbados > Saint Michael
- Oceania > Australia
- New South Wales > Sydney (0.14)
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- Asia
- Genre:
- Research Report (0.88)
- Technology: