Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble

Open in new window