An Expanded Massive Multilingual Dataset for High-Performance Language Technologies