DM-Codec: Distilling Multimodal Representations for Speech Tokenization