Are We Really Making Much Progress in Text Classification? A Comparative Review

Galke, Lukas, Diera, Andor, Lin, Bao Xin, Khera, Bhakti, Meuser, Tim, Singhal, Tushar, Karl, Fabian, Scherp, Ansgar

Jun-4-2023–arXiv.org Artificial Intelligence

This study reviews and compares methods for single-label and multi-label text classification, categorized into bag-of-words, sequence-based, graph-based, and hierarchical methods. The comparison aggregates results from the literature over five single-label and seven multi-label datasets and complements them with new experiments. The findings reveal that all recently proposed graph-based and hierarchy-based methods fail to outperform pre-trained language models and sometimes perform worse than standard machine learning methods like a multilayer perceptron on a bag-of-words. To assess the true scientific progress in text classification, future work should thoroughly test against strong bag-of-words baselines and state-of-the-art pre-trained language models.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

Jun-4-2023

arXiv.org PDF

Add feedback

Country:
- South America
  - Uruguay > Maldonado
    - Maldonado (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
  - New South Wales > Sydney (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Maryland > Baltimore (0.04)
    - Nevada (0.04)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Texas
      - Travis County > Austin (0.14)
      - Tarrant County > Fort Worth (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California
      - Los Angeles County > Long Beach (0.14)
      - San Diego County > San Diego (0.04)
      - Orange County > Irvine (0.04)
    - New York > New York County
      - New York City (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Mexico > Quintana Roo
    - Cancún (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Germany (0.05)
  - Austria (0.04)
  - Hungary > Budapest
    - Budapest (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Valletta (0.04)
  - Italy > Tuscany
    - Pisa Province > Pisa (0.04)
    - Florence (0.04)
  - Spain
    - Valencian Community > Valencia Province
      - Valencia (0.04)
    - Galicia > A Coruña Province
      - Santiago de Compostela (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Belgium > Flanders
    - West Flanders > Bruges (0.04)
  - Netherlands > Gelderland
    - Nijmegen (0.04)
  - France > Auvergne-Rhône-Alpes
    - Lyon > Lyon (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - South Korea (0.14)
  - Singapore (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Middle East
    - Jordan (0.04)
    - Israel (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)
  - Japan > Honshū
    - Kansai > Osaka Prefecture > Osaka (0.04)
  - China
    - Hong Kong (0.04)
    - Tianjin Province > Tianjin (0.04)
    - Guangdong Province > Shenzhen (0.04)
    - Beijing > Beijing (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (1.00)
- Overview (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Text Processing (1.00)
    - Text Classification (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks
      - Deep Learning (1.00)
      - Perceptrons (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found