Landmark-RxR: SolvingVision-and-Language NavigationwithFine-GrainedAlignmentSupervision

Neural Information Processing Systems 

In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions. Cross-modal alignment is one of the most critical challenges in VLN because the predicted trajectory needs to match the given instruction accurately.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found