Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

Neural Information Processing Systems 

By employing object identifiers, we transform diverse 3D scene-language tasks into a unified question-answering format, facilitating joint training without the need for additional task-specific heads.