AITopics | verbal instruction

Collaborating Authors

verbal instruction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Tagger: Deep Unsupervised Perceptual Grouping

Neural Information Processing SystemsOct-8-2024, 14:27:34 GMT

UPDATE: I thank the authors for their convincing rebuttal, and in view of the promised updates on the technical specifications and description of the method, I increased the scores for "Technical quality" and "Clarity and presentation". My only major concern I still have is the lack of a suitable baseline to compare with. In particular, I do not agree that a comparison to [1] is impossible without their code. Instead, I'd encourage the authors to compare their method on the multi-MNIST benchmark described in Figure 1 [1] (and to just use the numbers provided by [1] for comparison without re-simulation). This would significantly strengthen the results. Unfortunately, however, I see two major flaws with the current presentation of the material: ** Literature and comparison to competitors First, the literature on this topic seems not to be suitably accounted for.

deep unsupervised perceptual grouping, network literature, verbal instruction, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models

Totsila, Dionis, Rouxel, Quentin, Mouret, Jean-Baptiste, Ivaldi, Serena

arXiv.org Artificial IntelligenceJul-19-2024

This paper presents Words2Contact, a language-guided multi-contact placement pipeline leveraging large language models and vision language models. Our method is a key component for language-assisted teleoperation and human-robot cooperation, where human operators can instruct the robots where to place their support contacts before whole-body reaching or manipulation using natural language. Words2Contact transforms the verbal instructions of a human operator into contact placement predictions; it also deals with iterative corrections, until the human is satisfied with the contact location identified in the robot's field of view. We benchmark state-of-the-art LLMs and VLMs for size and performance in contact prediction. We demonstrate the effectiveness of the iterative correction process, showing that users, even naive, quickly learn how to instruct the system to obtain accurate locations. Finally, we validate Words2Contact in real-world experiments with the Talos humanoid robot, instructed by human operators to place support contacts on different locations and surfaces to avoid falling when reaching for distant objects.

instruction, robot, words2contact, (17 more...)

arXiv.org Artificial Intelligence

2407.14229

Country:

Europe > Netherlands > South Holland > Dordrecht (0.04)
Europe > France (0.04)

Genre:

Instructional Material (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Interactive Task Encoding System for Learning-from-Observation

Wake, Naoki, Kanehira, Atsushi, Sasabuchi, Kazuhiro, Takamatsu, Jun, Ikeuchi, Katsushi

arXiv.org Artificial IntelligenceApr-28-2023

We present the Interactive Task Encoding System (ITES) for teaching robots to perform manipulative tasks. ITES is designed as an input system for the Learning-from-Observation (LfO) framework, which enables household robots to be programmed using few-shot human demonstrations without the need for coding. In contrast to previous LfO systems that rely solely on visual demonstrations, ITES leverages both verbal instructions and interaction to enhance recognition robustness, thus enabling multimodal LfO. ITES identifies tasks from verbal instructions and extracts parameters from visual demonstrations. Meanwhile, the recognition result was reviewed by the user for interactive correction. Evaluations conducted on a real robot demonstrate the successful teaching of multiple operations for several scenarios, suggesting the usefulness of ITES for multimodal LfO. The source code is available at https://github.com/microsoft/symbolic-robot-teaching-interface.

artificial intelligence, demonstration, opération, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/AIM46323.2023.10196126

2212.10787

Country:

Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.34)

Add feedback

An artificial neural network to acquire grounded representations of robot actions and language

#artificialintelligenceMay-12-2021, 17:00:28 GMT

To best assist human users while they complete everyday tasks, robots should be able to understand their queries, answer them and perform actions accordingly. In other words, they should be able to flexibly generate and perform actions that are aligned with a user's verbal instructions. To understand a user's instructions and act accordingly, robotic systems should be able to make associations between linguistic expressions, actions and environments. Deep neural networks have proved to be particularly good at acquiring representations of linguistic expressions, yet they typically need to be trained on large datasets including robot actions, linguistic descriptions and information about different environments. Researchers at Waseda University in Tokyo recently developed a deep neural network that can acquire grounded representations of robot actions and linguistic descriptions of these actions.

neural network, representation, robot action, (10 more...)

#artificialintelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.26)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language Bootstrapping: Learning Word Meanings From Perception-Action Association

Salvi, Giampiero, Montesano, Luis, Bernardino, Alexandre, Santos-Victor, José

arXiv.org Machine LearningNov-27-2017

We address the problem of bootstrapping language acquisition for an artificial system similarly to what is observed in experiments with human infants. Our method works by associating meanings to words in manipulation tasks, as a robot interacts with objects and listens to verbal descriptions of the interactions. The model is based on an affordance network, i.e., a mapping between robot actions, robot perceptions, and the perceived effects of these actions upon objects. We extend the affordance model to incorporate spoken words, which allows us to ground the verbal symbols to the execution of actions and the perception of the environment. The model takes verbal descriptions of a task as the input and uses temporal co-occurrence to create links between speech utterances and the involved objects, actions, and effects. We show that the robot is able form useful word-to-meaning associations, even without considering grammatical structure in the learning process and in the presence of recognition errors. These word-to-meaning associations are embedded in the robot's own understanding of its actions. Thus, they can be directly used to instruct the robot to perform tasks and also allow to incorporate context in the speech recognition task. We believe that the encouraging results with our approach may afford robots with a capacity to acquire language descriptors in their operation's environment as well as to shed some light as to how this challenging process develops with human infants.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TSMCB.2011.2172420

1711.09714

Country:

Asia (0.67)
North America > United States (0.28)
Europe > Portugal > Lisbon > Lisbon (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

An Interactive Approach for Situated Task Teaching through Verbal Instructions

Mericli, Cetin (Carnegie Mellon University) | Klee, Steven D. (Carnegie Mellon University) | Paparian, Jack (Carnegie Mellon University) | Veloso, Manuela (Carnegie Mellon University)

AAAI ConferencesJul-9-2013

The ability to specify a task without having to write special software is an important and prominent feature for a mobile service robot deployed in a crowded office environment, working around and interacting with people. In this paper, we contribute an interactive approach for enabling the users to teach tasks to a mobile service robot through verbal commands. The input is given as typed or spoken instructions, which are then mapped to the available sensing and actuation primitives on the robot. The main contributions of this work are the addition of conditionals on sensory information that the specified actions to be executed in a closed-loop manner, and a correction mode that allows an existing task to be modified or corrected at a later time by providing a replacement action during the test execution. We describe all the components of the system along with the implementation details and illustrative examples in depth. We also discuss the extensibility of the presented system, and point out potential future extensions.

interactive approach, situated task teaching, verbal instruction

AAAI Conferences

Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback