Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling