Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction
Aly, Amir, Taniguchi, Tadahiro
–arXiv.org Artificial Intelligence
Robots are widely collaborating with human users in diferent tasks that require high-level cognitive functions to make them able to discover the surrounding environment. A difcult challenge that we briefy highlight in this short paper is inferring the latent grammatical structure of language, which includes grounding parts of speech (e.g., verbs, nouns, adjectives, and prepositions) through visual perception, and induction of Combinatory Categorial Grammar (CCG) for phrases. This paves the way towards grounding phrases so as to make a robot able to understand human instructions appropriately during interaction. Grounding words through visual perception - using a probabilistic generative model - has the objective of making a robot able to understand the meaning of action verbs, object characteristics (i.e., color and geometry), and spatial relationships between objects in space through a cross situational learning context between a human tutor and a robot (without any previous knowledge of language). This implies inducing unsupervised Part-of-Speech (POS) tags representing syntactic categories of words and grounding them, with the meaning of words, through visual perceptual information (Figures 1 & 2).
arXiv.org Artificial Intelligence
Dec-12-2018
- Country:
- Asia
- Europe > Portugal
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.05)
- Oceania > Australia
- Queensland > Brisbane (0.06)
- Genre:
- Research Report (0.40)
- Technology: