SOAT: AScene-andObject-AwareTransformerfor Vision-and-LanguageNavigation

Neural Information Processing Systems 

Specifically, we vary which inputs only serve as keys and values and which serve as queries, keys, and values in the multimodal transformer. Results for our model with selective object attention are presented in row 5. In Table 3, we provide results for all the possible combinations of these three modules. We highlight success with a green box when the agent reaches goal and failurecasewitharedbox.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found