Controlled Randomness Improves the Performance of Transformer Models
Deußer, Tobias, Zhao, Cong, Krämer, Wolfgang, Leonhard, David, Bauckhage, Christian, Sifa, Rafet
–arXiv.org Artificial Intelligence
The emergence of pre-trained transformer models brought a massive breakthrough in the field of natural language processing. During pre-training, such transformer models can learn generic language representations with strong generalization capabilities by applying a self-supervised learning approach and leveraging large text corpora. These pretrained language models can be fine-tuned in various downstream tasks without needing to train from scratch compared to traditional training methods, significantly reducing training costs while achieving excellent performance. Models like BERT Devlin et al. (2019), ELECTRA Clark et al. (2020), or T5 Raffel et al. (2020) have achieved remarkable results on several language processing tasks and the most recent developments of even larger language models, made prominent by GPT-3 Brown et al. (2020) and GPT-4 OpenAI (2023) but not limited to these two
arXiv.org Artificial Intelligence
Oct-20-2023