Learning Compact Vision Tokens for Efficient Large Multimodal Models

Open in new window