Learning Where to Learn: Training Distribution Selection for Provable OOD Performance