The BigScience ROOTS Corpus: A1.6TB Composite Multilingual Dataset