Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Jain, Vihan, Magalhaes, Gabriel, Ku, Alexander, Vaswani, Ashish, Ie, Eugene, Baldridge, Jason

arXiv.org Artificial Intelligence 

Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation (VLN), in which agents interpret natural language instructions and visual scenes to move through environments and reach goals. Despite recent progress, current research leaves unclear how much of a role language understanding plays in this task, especially because dominant evaluation metrics have focused on Figure 1: It's the journey, not just the goal. To give goal completion rather than the sequence of actions language its due place in VLN, we compose paths in corresponding to the instructions. Here, the R2R dataset to create longer, twistier R4R paths we highlight shortcomings of current metrics (blue). Under commonly used metrics, agents that head for the Room-to-Room dataset (Anderson et al., straight to the goal (red) are not penalized for ignoring 2018b) and propose a new metric, Coverage the language instructions: for instance, SPL yields a weighted by Length Score (CLS). We also show perfect 1.0 score for the red and only 0.17 for the orange that the existing paths in the dataset are not path. In contrast, our proposed CLS metric measures ideal for evaluating instruction following because fidelity to the reference path, strongly preferring the they are direct-to-goal shortest paths.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found