An Integrated Data Processing Framework for Pretraining Foundation Models