Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach