AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments

Dec-23-2025, 22:50:58 GMT–Neural Information Processing Systems

Recent years have seen embodied visual navigation advance in two distinct directions: (i) in equipping the AI agent to follow natural language instructions, and (ii) in making the navigable world multimodal, e.g., audio-visual navigation. However, the real world is not only multimodal, but also often complex, and thus in spite of these advances, agents still need to understand the uncertainty in their actions and seek instructions to navigate.

audio-visual-language embodied navigation, avlen, name change, (10 more...)

Neural Information Processing Systems

Dec-23-2025, 22:50:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.39)