Pseudo-Labels Are All You Need

Kostić, Bogdan, Lucka, Mathis, Risch, Julian

Aug-19-2022–arXiv.org Artificial Intelligence

Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simplification approaches. In this paper, we present our submission to the Text Complexity DE Challenge 2022, a regression task where the goal is to predict the complexity of a German sentence for German learners at level B. Our approach relies on more than 220,000 pseudo-labels created from the German Wikipedia and other corpora to train Transformer-based models, and refrains from any feature engineering or any additional, labeled data. We find that the pseudo-label-based approach gives impressive results yet requires little to no adjustment to the specific task and therefore could be easily adapted to other domains and tasks.

complexity, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

Aug-19-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > New Mexico
    - Santa Fe County > Santa Fe (0.04)
- Europe
  - Spain (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Germany
    - Saxony > Leipzig (0.05)
    - Brandenburg > Potsdam (0.04)
    - North Rhine-Westphalia > Düsseldorf Region
      - Düsseldorf (0.14)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.35)
  - Machine Learning > Neural Networks
    - Deep Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found