Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data