Data-Efficient Pretraining with Group-Level Data Influence Modeling