vision-and-language navigation
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions Heng Li
Vision-and-Language Navigation (VLN) aims to develop embodied agents that navigate based on human instructions. However, current VLN frameworks often rely on static environments and optimal expert supervision, limiting their real-world applicability. To address this, we introduce Human-Aware Vision-and-Language Navigation (HA-VLN), extending traditional VLN by incorporating dynamic human activities and relaxing key assumptions.
- North America > United States (0.92)
- Asia > India (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Education (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- North America > United States > Oregon (0.04)
- Asia > India (0.04)
HistoryAwareMultimodalTransformerfor Vision-and-LanguageNavigation
HAMT efficientlyencodes allthepastpanoramic observationsviaahierarchical vision transformer (ViT), which first encodes individual images with ViT, then models spatial relation between images in a panoramic observation and finally takes into account temporal relation between panoramas in the history.
Frequency-enhanced Data Augmentation for Vision-and-Language Navigation--- -- Supplemental Material--- -- Keji He
Table 1 presents the impacts of different random seeds for sampling the interference images. Experiments in the main manuscript are based on seed-1 which has an average performance. Figure 1: Navigation examples in normal and high-frequency perturbed scenes. In the examples shown in Figure 4, both models obtained similar textual attention. In Figure 6, according to the given instruction, the agent should turn left to enter the room corresponding to the second view.
- Asia > Singapore (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > China (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)