Semantic Compression via Multimodal Representation Learning