Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation