Landmark-RxR: SolvingVision-and-Language NavigationwithFine-GrainedAlignmentSupervision
–Neural Information Processing Systems
In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions. Cross-modal alignment is one of the most critical challenges in VLN because the predicted trajectory needs to match the given instruction accurately.
Neural Information Processing Systems
Feb-7-2026, 08:05:27 GMT
- Technology: