Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis