Realistic Evaluation of Model Merging for Compositional Generalization