The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Open in new window