Nonparametric Masked Language Modeling

Min, Sewon, Shi, Weijia, Lewis, Mike, Chen, Xilun, Yih, Wen-tau, Hajishirzi, Hannaneh, Zettlemoyer, Luke

May-25-2023–arXiv.org Artificial Intelligence

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show that NPM can be efficiently trained with a contrastive objective and an in-batch approximation to full corpus retrieval. Zero-shot evaluation on 16 tasks including classification, fact probing and question answering demonstrates that NPM outperforms significantly larger parametric models, either with or without a retrieve-and-generate approach. It is particularly better at dealing with rare patterns (word senses or facts) and predicting rare or nearly unseen words (e.g., non-Latin script). We release the model and code at github.com/facebookresearch/NPM.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

May-25-2023

arXiv.org PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report (0.63)

Industry:
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.70)
    - Statistical Learning (1.00)
  - Natural Language
    - Large Language Model (0.91)
    - Machine Translation (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found