LanguageUnderstanding

Neural Information Processing Systems 

However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning.