Goto

Collaborating Authors

 Recchiuto, Carmine Tommaso


Labeling Sentences with Symbolic and Deictic Gestures via Semantic Similarity

arXiv.org Artificial Intelligence

Co-speech gesture generation on artificial agents has gained attention recently, mainly when it is based on data-driven models. However, end-to-end methods often fail to generate co-speech gestures related to semantics with specific forms, i.e., Symbolic and Deictic gestures. In this work, we identify which words in a sentence are contextually related to Symbolic and Deictic gestures. Firstly, we appropriately chose 12 gestures recognized by people from the Italian culture, which different humanoid robots can reproduce. Then, we implemented two rule-based algorithms to label sentences with Symbolic and Deictic gestures. The rules depend on the semantic similarity scores computed with the RoBerta model between sentences that heuristically represent gestures and sub-sentences inside an objective sentence that artificial agents have to pronounce. We also implemented a baseline algorithm that assigns gestures without computing similarity scores. Finally, to validate the results, we asked 30 persons to label a set of sentences with Deictic and Symbolic gestures through a Graphical User Interface (GUI), and we compared the labels with the ones produced by our algorithms. For this scope, we computed Average Precision (AP) and Intersection Over Union (IOU) scores, and we evaluated the Average Computational Time (ACT). Our results show that semantic similarity scores are useful for finding Symbolic and Deictic gestures in utterances.


Enhancing LLM-Based Human-Robot Interaction with Nuances for Diversity Awareness

arXiv.org Artificial Intelligence

This paper presents a system for diversity-aware autonomous conversation leveraging the capabilities of large language models (LLMs). The system adapts to diverse populations and individuals, considering factors like background, personality, age, gender, and culture. The conversation flow is guided by the structure of the system's pre-established knowledge base, while LLMs are tasked with various functions, including generating diversity-aware sentences. Achieving diversity-awareness involves providing carefully crafted prompts to the models, incorporating comprehensive information about users, conversation history, contextual details, and specific guidelines. To assess the system's performance, we conducted both controlled and real-world experiments, measuring a wide range of performance indicators.


Collaborative Active SLAM: Synchronous and Asynchronous Coordination Among Agents

arXiv.org Artificial Intelligence

In autonomous robotics, a critical challenge lies in developing robust solutions for Active Collaborative SLAM, wherein multiple robots collaboratively explore and map an unknown environment while intelligently coordinating their movements and sensor data acquisitions. In this article, we present two approaches for coordinating a system consisting of multiple robots to perform Active Collaborative SLAM (AC-SLAM) for environmental exploration. Our two coordination approaches, synchronous and asynchronous implement a methodology to prioritize robot goal assignments by the central server. We also present a method to efficiently spread the robots for maximum exploration while keeping SLAM uncertainty low. Both coordination approaches were evaluated through simulation and experiments on publicly available datasets, rendering promising results.


Entropy Based Multi-robot Active SLAM

arXiv.org Artificial Intelligence

The objective is to find the optimal state vector that minimizes the measurement error between the estimated pose and environmental landmarks. Most SLAM algorithms are passive, i.e., the robot is controlled manually and the navigation or path planning algorithm does not actively take part in robot motion or trajectory. Active SLAM (A-SLAM), however, tries to solve the optimal exploration problem of the unknown environment by proposing a navigation strategy that generates future goal/target positions actions which decrease map and pose uncertainties, thus enabling a fully autonomous navigation and mapping SLAM system without the need of an external controller or human effort. In Active Collaborative SLAM (AC-SLAM) multiple robots interchange information to improve their localization estimation and map accuracy to achieve some high-level tasks such as exploration. The exchanged information can be localization information [1], entropy [2], visual features [3], and frontier points [4]. In this article, we present a multi-agent AC-SLAM system for efficient environment exploration using frontiers detected over an Occupancy Grid (OG) map. In particular, in this work, we aim at: 1. Extending the A-SLAM approach of [5] which uses a computationally inexpensive D-optimality criterion for utility computation to a multi-agent AC-SLAM framework.


Culture-to-Culture Image Translation and User Evaluation

arXiv.org Artificial Intelligence

The article introduces the concept of image "culturization," which we define as the process of altering the ``brushstroke of cultural features" that make objects perceived as belonging to a given culture while preserving their functionalities. First, we defined a pipeline for translating objects' images from a source to a target cultural domain based on state-of-the-art Generative Adversarial Networks. Then, we gathered data through an online questionnaire to test four hypotheses concerning the impact of images belonging to different cultural domains on Italian participants. As expected, results depend on individual tastes and preferences: however, they align with our conjecture that some people, during the interaction with an intelligent system, will prefer to be shown images modified to match their cultural background. The study has two main limitations. First, we focussed on the culturization of individual objects instead of complete scenes. However, objects play a crucial role in conveying cultural meanings and can strongly influence how an image is perceived within a specific cultural context. Understanding and addressing object-level translation is a vital step toward achieving more comprehensive scene-level translation in future research. Second, we performed experiments with Italian participants only. We think that there are unique aspects of Italian culture that make it an interesting and relevant case study for exploring the impact of image culturization. Italy is a very culturally conservative society, and Italians have specific sensitivities and expectations regarding the accurate representation of their cultural identity and traditions, which can shape individuals' preferences and inclinations toward certain visual styles, aesthetics, and design choices. As a consequence, we think they are an ideal candidate for a preliminary investigation of image culturization.


Sustainable Cloud Services for Verbal Interaction with Embodied Agents

arXiv.org Artificial Intelligence

This article presents the design and the implementation of a cloud system for knowledge-based autonomous interaction devised for Social Robots and other conversational agents. The system is particularly convenient for low-cost robots and devices: it can be used as a stand-alone dialogue system or as an integration to provide "background" dialogue capabilities to any preexisting Natural Language Processing ability that the robot may already have as part of its basic skills. By connecting to the cloud, developers are provided with a sustainable solution to manage verbal interaction through a network connection, with about 3,000 topics of conversation ready for "chit-chatting" and a library of pre-cooked plans that only needs to be grounded into the robot's physical capabilities. The system is structured as a set of REST API endpoints so that it can be easily expanded by adding new APIs to improve the capabilities of the clients connected to the cloud. Another key feature of the system is that it has been designed to make the development of its clients straightforward: in this way, multiple robots and devices can be easily endowed with the capability of autonomously interacting with the user, understanding when to perform specific actions, and exploiting all the information provided by cloud services. The article outlines and discusses the results of the experiments performed to assess the system's performance in terms of response time, paving the way for its use both for research and market solutions. Links to repositories with clients for ROS and popular robots such as Pepper and NAO are available on request.


Knowledge Triggering, Extraction and Storage via Human-Robot Verbal Interaction

arXiv.org Artificial Intelligence

This article describes a novel approach to expand in run-time the knowledge base of an Artificial Conversational Agent. A technique for automatic knowledge extraction from the user's sentence and four methods to insert the new acquired concepts in the knowledge base have been developed and integrated into a system that has already been tested for knowledge-based conversation between a social humanoid robot and residents of care homes. The run-time addition of new knowledge allows overcoming some limitations that affect most robots and chatbots: the incapability of engaging the user for a long time due to the restricted number of conversation topics. The insertion in the knowledge base of new concepts recognized in the user's sentence is expected to result in a wider range of topics that can be covered during an interaction, making the conversation less repetitive. Two experiments are presented to assess the performance of the knowledge extraction technique, and the efficiency of the developed insertion methods when adding several concepts in the Ontology.