ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification

Open in new window