Scalable Data Ablation Approximations for Language Models through Modular Training and Merging