A Variational Framework for Improving Naturalness in Generative Spoken Language Models