Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts