Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens

Open in new window