Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?
Jin, Jiahe, He, Yanheng, Yang, Mingyan
–arXiv.org Artificial Intelligence
In this work, we identify the "2D-Cheating" problem in 3D LLM evaluation, where these tasks might be easily solved by VLMs with rendered images of point clouds, exposing ineffective evaluation of 3D LLMs' unique 3D capabilities. We test VLM performance across multiple 3D LLM benchmarks and, using this as a reference, propose principles for Figure 1: Example of 2D-Cheating. With rendered better assessing genuine 3D understanding. We images of the point cloud, VLMs could easily solve also advocate explicitly separating 3D abilities some 3D tasks, and even outperform 3D LLMs.
arXiv.org Artificial Intelligence
Feb-12-2025