From Visuo-Motor to Language
Semwal, Deepali (Institute of Technology) | Gupta, Sunakshi (Indian Institute of Technology) | Mukerjee, Amitabha (Indian Institute of Technology)
We propose a learning agent that first learns concepts in an integrated, cross-modal manner, and then uses these as the semantics model to map language. We consider an abstract model for the action of throwing, modeling the entire trajectory. From a large set of throws, we take the trajectory images and and the throwing parameters. These are mapped jointly onto a low-dimensional non-linear manifold. Such models improve with practice, and can be used as the starting point for real-life tasks such as aiming darts or recognizing throws by others. How can such models can be used in learning language? We consider a set of videos involving throwing and rolling actions. These actions are analyzed into a set of contrastive semantic classes based on agent, action, and the thrown object (trajector). We obtain crowdsourced commentaries for these videos (raw text) from a number of adults. The learner attempts to associate labels using contrastive probabilities for the semantic class. Only a handful of high-confidence words are found, but the agent starts off with this partial knowledge. These are used to learn incrementally larger syntactic patterns, initially for the trajector, and eventually for full agent-trajector-action sentences. We demonstrate how this may work for two completely different languages - English and Hindi, and also show how rudiments of agreement, synonymy and polysemy are detected.
Nov-1-2014
- Industry:
- Health & Medicine (0.47)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science (1.00)
- Machine Learning (1.00)
- Natural Language > Text Processing (1.00)
- Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence