Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

Feb-12-2025–arXiv.org Artificial Intelligence

In this work, we identify the "2D-Cheating" problem in 3D LLM evaluation, where these tasks might be easily solved by VLMs with rendered images of point clouds, exposing ineffective evaluation of 3D LLMs' unique 3D capabilities. We test VLM performance across multiple 3D LLM benchmarks and, using this as a reference, propose principles for Figure 1: Example of 2D-Cheating. With rendered better assessing genuine 3D understanding. We images of the point cloud, VLMs could easily solve also advocate explicitly separating 3D abilities some 3D tasks, and even outperform 3D LLMs.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Feb-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Shanghai > Shanghai (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.31)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found