When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale