Pseudolikelihood Reranking with Masked Language Models

Salazar, Julian, Liang, Davis, Nguyen, Toan Q., Kirchhoff, Katrin

Oct-31-2019–arXiv.org Machine Learning

We rerank with scores from pretrained masked language models like BERT to improve ASR and NMT performance. These log-pseudolikelihood scores (LPLs) can outperform large, autoregressive language models (GPT -2) in out-of-the-box scoring. RoBERTa reduces WER by up to 30% relative on an end-to-end LibriSpeech system and adds up to 1.7 BLEU on state-of-the-art baselines for TED Talks low-resource pairs, with further gains from domain adaptation. In the multilingual setting, a single XLM can be used to rerank translation outputs in multiple languages. The numerical and qualitative properties of LPL scores suggest that LPLs capture sentence fluency better than autoregressive scores. Finally, we finetune BERT to estimate sentence LPLs without masking, enabling scoring in a single, non-recurrent inference pass.

bert, language model, machine translation, (14 more...)

arXiv.org Machine Learning

Oct-31-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Machine Translation (1.00)
    - Large Language Model (0.90)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found