Small Languages, Big Models: A Study of Continual Training on Languages of Norway

Samuel, David, Mikhailov, Vladislav, Velldal, Erik, Øvrelid, Lilja, Charpentier, Lucas Georges Gabriel, Kutuzov, Andrey

arXiv.org Artificial Intelligence 

This method vast amounts of data, posing a challenge enables us to train an 11.4B parameter model that for less widely spoken languages like Norwegian achieves state-of-the-art performance across Norwegian and even more so for truly lowresource language tasks while maintaining strong languages like Sámi. To address capabilities in Northern Sámi. The three main research this issue, we present a novel three-stage contributions of this paper can be summarized continual training approach. We also experiment as follows: with combining causal and masked 1. Novel training method for data-constrained language modeling to get more flexible language models We propose a three-stage models. Based on our findings, we train, training method for efficient adaptation of existing evaluate, and openly release a new large language models to lower-resource languages.