Efficient Large Multi-modal Models via Visual Context Compression

Open in new window