Unified Multimodal Understanding via Byte-Pair Visual Encoding