Unified Multimodal Understanding via Byte-Pair Visual Encoding

Open in new window