Iwahashi, Naoto
Symbol Emergence in Cognitive Developmental Systems: a Survey
Taniguchi, Tadahiro, Ugur, Emre, Hoffmann, Matej, Jamone, Lorenzo, Nagai, Takayuki, Rosman, Benjamin, Matsuka, Toshihiko, Iwahashi, Naoto, Oztop, Erhan, Piater, Justus, Wörgötter, Florentin
Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol grounding problem has been regarded as one of the central problems related to {\it symbols}. However, the symbol grounding problem was originally posed to connect symbolic AI and sensorimotor information and did not consider many interdisciplinary phenomena in human communication and dynamic symbol systems in our society, which semiotics considered. In this paper, we focus on the symbol emergence problem, addressing not only cognitive dynamics but also the dynamics of symbol systems in society, rather than the symbol grounding problem. We first introduce the notion of a symbol in semiotics from the humanities, to leave the very narrow idea of symbols in symbolic AI. Furthermore, over the years, it became more and more clear that symbol emergence has to be regarded as a multifaceted problem. Therefore, secondly, we review the history of the symbol emergence problem in different fields, including both biological and artificial systems, showing their mutual relations. We summarize the discussion and provide an integrative viewpoint and comprehensive overview of symbol emergence in cognitive systems. Additionally, we describe the challenges facing the creation of cognitive systems that can be part of symbol emergence systems.
Symbol Emergence in Robotics: A Survey
Taniguchi, Tadahiro, Nagai, Takayuki, Nakamura, Tomoaki, Iwahashi, Naoto, Ogata, Tetsuya, Asoh, Hideki
Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.
Active Learning for Generating Motion and Utterances in Object Manipulation Dialogue Tasks
Sugiura, Komei (National Institute of Information and Communications Technology) | Iwahashi, Naoto (National Institute of Information and Communications Technology) | Kawai, Hisashi (National Institute of Information and Communications Technology) | Nakamura, Satoshi (National Institute of Information and Communications Technology)
In an object manipulation dialogue, a robot may misunderstand an ambiguous command from a user, such as 'Place the cup down (on the table)," potentially resulting in an accident. Although making confirmation questions before all motion execution will decrease the risk of this failure, the user will find it more convenient if confirmation questions are not made under trivial situations. This paper proposes a method for estimating ambiguity in commands by introducing an active learning framework with Bayesian logistic regression to human-robot spoken dialogue. We conducted physical experiments in which a user and a manipulator-based robot communicated using spoken language to manipulate objects.
Robots that Learn to Communicate: A Developmental Approach to Personally and Physically Situated Human-Robot Conversations
Iwahashi, Naoto (National Institute of Information and Communications Technology) | Sugiura, Komei (National Institute of Information and Communications Technology) | Taguchi, Ryo (Nagoya Institute of Technology) | Nagai, Takayuki (University of Electyro-Communications) | Taniguchi, Tadahiro (Ritsumeika University)
This paper summarizes the online machine learning method LCore, which enables robots to learn to communicate with users from scratch through verbal and behavioral interaction in the physical world. LCore combines speech, visual, and tactile information obtained through the interaction, and enables robots to learn beliefs regarding speech units, words, the concepts of objects, motions, grammar, and pragmatic and communicative capabilities. The overall belief system is represented by a dynamic graphical model in an integrated way. Experimental results show that through a small, practical number of learning episodes with a user, the robot was eventually able to understand even fragmental and ambiguous utterances, respond to them with confirmation questions and/or actions, generate directive utterances, and answer questions, appropriately for the given situation. This paper discusses the importance of a developmental approach to realize personally and physically situated human-robot conversations.
Grounding New Words on the Physical World in Multi-Domain Human-Robot Dialogues
Nakano, Mikio (Honda Research Institute Japan Co., Ltd.) | Iwahashi, Naoto (ATR Media Information Science Research Laboratories / National Institute of Information and Communications Technology) | Nagai, Takayuki (University of Electro-Communications) | Sumii, Taisuke (ATR Media Information Science Research Laboratories / Kyoto Institute of Technology) | Zuo, Xiang (ATR Media Information Science Research Laboratories / Kyoto Institute of Technology) | Taguchi, Ryo (ATR Media Information Science Research Laboratories / Nagoya Institute of Technology) | Nose, Takashi (ATR Media Information Science Research Laboratories / Tokyo Institute of Technology) | Mizutani, Akira (University of Electro-Communications) | Nakamura, Tomoaki (University of Electro-Communications) | Attamim, Muhanmad (University of Electro-Communications) | Narimatsu, Hiromi (University of Electro-Communications) | Funakoshi, Kotaro (Honda Research Institute Japan Co., Ltd.) | Hasegawa, Yuji (Honda Research Institute Japan Co., Ltd.)
This paper summarizes our ongoing project on developing an architecture for a robot that can acquire new words and their meanings while engaging in multi-domain dialogues. These two functions are crucial in making conversational service robots work in real tasks in the real world. Household robots and office robots need to be able to work in multiple task domains and they also need to engage in dialogues in multiple domains corresponding to those task domains. Lexical acquisition is necessary because speech understanding cannot be done without enough knowledge on words that are possibly spoken in the task domain. Our architecture is based on a multi-expert model in which multiple domain experts are employed and one of them is selected based on the user utterance and the situation to engage in the control of the dialogue and physical behaviors. We incorporate experts that have an ability to acquire new lexical entries and their meanings grounded on the physical world through spoken interactions. By appropriately selecting those experts, lexical acquisition in multi-domain dialogues becomes possible. An example robotic system based on this architecture that can acquire object names and location names demonstrates the viability of the architecture.