The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining

Open in new window