LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token