Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention
–Neural Information Processing Systems
Multi-object 3D Grounding involves locating 3D boxes based on a given query phrase from a point cloud. It is a challenging and significant task with numerous applications in visual understanding, human-computer interaction, and robotics. To tackle this challenge, we introduce D-LISA, a two-stage approach incorporating three innovations. First, a dynamic vision module that enables a variable and learnable number of box proposals.
Neural Information Processing Systems
Mar-27-2025, 11:52:46 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.46)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence