Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models