A Simple Recipe for Multilingual Grammatical Error Correction

Rothe, Sascha, Mallinson, Jonathan, Malmi, Eric, Krause, Sebastian, Severyn, Aliaksei

Aug-9-2022–arXiv.org Artificial Intelligence

This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a cLang-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages -- we demonstrate that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

computational linguistic, correction, dataset, (14 more...)

arXiv.org Artificial Intelligence

Aug-9-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California
      - San Diego County > San Diego (0.04)
      - Los Angeles County > Long Beach (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (0.70)

Technology:
- Information Technology
  - Data Science > Data Quality
    - Data Cleaning (0.63)
  - Artificial Intelligence > Natural Language
    - Grammars & Parsing (0.63)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found