XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, Quoc V. Le

Aug-20-2025, 05:38:26 GMT–Neural Information Processing Systems

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy.

arxiv preprint arxiv, bert, objective, (13 more...)

Neural Information Processing Systems

Aug-20-2025, 05:38:26 GMT

Conferences PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States > Pennsylvania
    - Allegheny County > Pittsburgh (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
XLNet: Generalized Autoregressive Pretraining for Language Understanding

Similar Docs Excel Report more

Title	Similarity	Source
None found