Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models