Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Arnaud, Sergio, McVay, Paul, Martin, Ada, Majumdar, Arjun, Jatavallabhula, Krishna Murthy, Thomas, Phillip, Partsey, Ruslan, Dugas, Daniel, Gejji, Abha, Sax, Alexander, Berges, Vincent-Pierre, Henaff, Mikael, Jain, Ayush, Cao, Ang, Prasad, Ishita, Kalakrishnan, Mrinal, Rabbat, Michael, Ballas, Nicolas, Assran, Mido, Maksymets, Oleksandr, Rajeswaran, Aravind, Meier, Franziska

Apr-22-2025–arXiv.org Artificial Intelligence

We present LOCATE 3D, a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp." LOCATE 3D sets a new state-of-the-art on standard referential grounding benchmarks and showcases robust generalization capabilities. Notably, LOCATE 3D operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices. Key to our approach is 3D-JEPA, a novel self-supervised learning (SSL) algorithm applicable to sensor point clouds. It takes as input a 3D pointcloud featurized using 2D foundation models (CLIP, DINO). Subsequently, masked prediction in latent space is employed as a pretext task to aid the self-supervised learning of contextualized pointcloud features. Once trained, the 3D-JEPA encoder is finetuned alongside a language-conditioned decoder to jointly predict 3D masks and bounding boxes. Additionally, we introduce LOCATE 3D DATASET, a new dataset for 3D referential grounding, spanning multiple capture setups with over 130K annotations. This enables a systematic study of generalization capabilities as well as a stronger model.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Apr-22-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Indonesia > Bali (0.04)
  - Middle East > Saudi Arabia
    - Asir Province > Abha (0.04)
  - Singapore (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Appliances & Durable Goods (0.34)
- Information Technology (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Inductive Learning (0.90)
    - Neural Networks > Deep Learning (0.94)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found