Memorizing Long-tail Data Can Help Generalization Through Composition