Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

Open in new window