Reviews: XLNet: Generalized Autoregressive Pretraining for Language Understanding

Jan-27-2025, 13:09:13 GMT–Neural Information Processing Systems

Originality: The architecture is novel compare to recent lines of language model work, which all used variation of BERT or GPT (SciBERT, MT-DNN, MASS and etc). The example ("New York is a city" one) makes sense, but considering the permutation is random when computing the objective function, I still couldn't get why it works better than sequential order because human speaks/writes in sequential order. Could you add more intuitions in paper? Or have you tried predicting n-gram, compare to permutation? Quality: Very high considering they did extensive of studies on multiple benchmarks, also the ablation study is nicely done as well.

ablation study, generalized autoregressive pretraining, sequential order, (2 more...)

Neural Information Processing Systems

Jan-27-2025, 13:09:13 GMT

Conferences Web Page

Add feedback

Country:
- North America > United States > New York (0.28)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)