CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

Open in new window