ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling

Alcoforado, Alexandre, Ferraz, Thomas Palmeira, Gerber, Rodrigo, Bustos, Enzo, Oliveira, André Seidel, Veloso, Bruno Miguel, Siqueira, Fabio Levy, Costa, Anna Helena Reali

Jan-4-2022–arXiv.org Artificial Intelligence

Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12 % in the F1 score in the FolhaUOL dataset.

classification, leveraging zero-shot text classification, zeroberto, (13 more...)

arXiv.org Artificial Intelligence

Jan-4-2022

arXiv.org PDF

Add feedback

Country:
- South America > Brazil
  - São Paulo (0.05)
- Europe
  - France (0.04)
  - Portugal > Porto
    - Porto (0.04)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Classification (1.00)
  - Machine Learning > Statistical Learning
    - Clustering (0.34)