AITopics | speaker-follower model

Collaborating Authors

speaker-follower model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speaker-Follower Models for Vision-and-Language Navigation

Neural Information Processing SystemsNov-20-2025, 22:22:57 GMT

Navigation guided by natural language instructions presents a challenging reasoning problem for instruction followers. Natural language instructions typically identify only a few high-level decisions and landmarks rather than complete low-level motor behaviors; much of the missing information must be inferred based on perceptual context. In machine learning settings, this is doubly challenging: it is difficult to collect enough annotated data to enable learning of this reasoning process from scratch, and also difficult to implement the reasoning process using generic sequence models. Here we describe an approach to vision-and-language navigation that addresses both these issues with an embedded speaker model. We use this speaker model to (1) synthesize new instructions for data augmentation and to (2) implement pragmatic reasoning, which evaluates how well candidate action sequences explain an instruction. Both steps are supported by a panoramic action space that reflects the granularity of human-generated instructions. Experiments show that all three components of this approach---speaker-driven data augmentation, pragmatic reasoning and panoramic action space---dramatically improve the performance of a baseline instruction follower, more than doubling the success rate over the best existing approach on a standard benchmark.

name change, speaker-follower model, vision-and-language navigation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Reviews: Speaker-Follower Models for Vision-and-Language Navigation

Neural Information Processing SystemsOct-7-2024, 12:19:39 GMT

This paper builds upon the indoor vision and language-grounded navigation task and sequence-to-sequence model described in (Anderson et al, 2017), by introducing three improvements: 1) An encoder-decoder-like architecture, dubbed "speaker-follower" model, that not only decodes natural language instructions into a sequence of navigation actions using seq2seq, but also decodes a sequence of navigation actions and of image features into a sequence of natural language instructions using a symmetric seq2seq. That speaker model can then be used for scoring candidate routes (i.e., candidate sequences of images and actions) w.r.t. the likelihood of the natural language instruction under the speaker model. This enables a form of planning for the seq2seq-based agent. The image and motion are decomposed into 12 yaw and 3 pitch angles. The authors achieve state-of-the-art performance on the task and do a good ablation analysis of the impacts of their 3 improvements, although I would have liked to see navigation attention maps in the appendix as well.

natural language instruction, speaker-follower model, vision-and-language navigation, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.85)

Add feedback

Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following

Akuzawa, Kei, Iwasawa, Yusuke, Matsuo, Yutaka

arXiv.org Artificial IntelligenceDec-28-2022

Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data through the representation. Key challenges in applying the models to sequence-to-sequence tasks including instruction following are learning a shared representation of variable-length mulitimodal data and incorporating attention mechanisms. To address the problems, this paper proposes a novel network architecture to absorb the difference in the sequence lengths of the multimodal data. In addition, to further improve the performance, this paper shows how to incorporate the generative model-based approach with an existing semi-supervised method called a speaker-follower model, and proposes a regularization term that improves inference using unpaired trajectories. Experiments on BabyAI and Room-to-Room (R2R) environments show that the proposed method improves the performance of instruction following by leveraging unpaired data, and improves the performance of the speaker-follower model by 2\% to 4\% in R2R.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.00676

Country: