Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

Open in new window