Learning Compact Vision Tokens for Efficient Large Multimodal Models