Controlled Randomness Improves the Performance of Transformer Models

Deußer, Tobias, Zhao, Cong, Krämer, Wolfgang, Leonhard, David, Bauckhage, Christian, Sifa, Rafet

Oct-20-2023–arXiv.org Artificial Intelligence

The emergence of pre-trained transformer models brought a massive breakthrough in the field of natural language processing. During pre-training, such transformer models can learn generic language representations with strong generalization capabilities by applying a self-supervised learning approach and leveraging large text corpora. These pretrained language models can be fine-tuned in various downstream tasks without needing to train from scratch compared to traditional training methods, significantly reducing training costs while achieving excellent performance. Models like BERT Devlin et al. (2019), ELECTRA Clark et al. (2020), or T5 Raffel et al. (2020) have achieved remarkable results on several language processing tasks and the most recent developments of even larger language models, made prominent by GPT-3 Brown et al. (2020) and GPT-4 OpenAI (2023) but not limited to these two

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-20-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Germany (0.28)

Genre:
- Research Report (1.00)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found