It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models