SOAT: AScene-andObject-AwareTransformerfor Vision-and-LanguageNavigation

Feb-8-2026, 07:25:43 GMT–Neural Information Processing Systems

Specifically, we vary which inputs only serve as keys and values and which serve as queries, keys, and values in the multimodal transformer. Results for our model with selective object attention are presented in row 5. In Table 3, we provide results for all the possible combinations of these three modules. We highlight success with a green box when the agent reaches goal and failurecasewitharedbox.

ofobject feature, soat, vision-and-languagenavigation, (11 more...)

Neural Information Processing Systems

Feb-8-2026, 07:25:43 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
SOAT: AScene-and Object-Aware Transformer for Vision-and-Language Navigation

Similar Docs Excel Report more

Title	Similarity	Source
None found