Semantic Residual for Multimodal Unified Discrete Representation