Are We Really Making Much Progress in Text Classification? A Comparative Review
Galke, Lukas, Diera, Andor, Lin, Bao Xin, Khera, Bhakti, Meuser, Tim, Singhal, Tushar, Karl, Fabian, Scherp, Ansgar
–arXiv.org Artificial Intelligence
This study reviews and compares methods for single-label and multi-label text classification, categorized into bag-of-words, sequence-based, graph-based, and hierarchical methods. The comparison aggregates results from the literature over five single-label and seven multi-label datasets and complements them with new experiments. The findings reveal that all recently proposed graph-based and hierarchy-based methods fail to outperform pre-trained language models and sometimes perform worse than standard machine learning methods like a multilayer perceptron on a bag-of-words. To assess the true scientific progress in text classification, future work should thoroughly test against strong bag-of-words baselines and state-of-the-art pre-trained language models.
arXiv.org Artificial Intelligence
Jun-4-2023
- Country:
- South America
- Uruguay > Maldonado
- Maldonado (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Uruguay > Maldonado
- Oceania > Australia
- Victoria > Melbourne (0.04)
- New South Wales > Sydney (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Maryland > Baltimore (0.04)
- Nevada (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Texas
- Travis County > Austin (0.14)
- Tarrant County > Fort Worth (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- Los Angeles County > Long Beach (0.14)
- San Diego County > San Diego (0.04)
- Orange County > Irvine (0.04)
- New York > New York County
- New York City (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- Mexico > Quintana Roo
- Cancún (0.04)
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Germany (0.05)
- Austria (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Italy > Tuscany
- Pisa Province > Pisa (0.04)
- Florence (0.04)
- Spain
- Denmark > Capital Region
- Copenhagen (0.04)
- Belgium > Flanders
- West Flanders > Bruges (0.04)
- Netherlands > Gelderland
- Nijmegen (0.04)
- France > Auvergne-Rhône-Alpes
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- South Korea (0.14)
- Singapore (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Middle East
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- China
- Hong Kong (0.04)
- Tianjin Province > Tianjin (0.04)
- Guangdong Province > Shenzhen (0.04)
- Beijing > Beijing (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- South America
- Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)
- Technology: