robot perception
Toward a Universal Concept of Artificial Personality: Implementing Robotic Personality in a Kinova Arm
Nardelli, Alice, Landolfi, Lorenzo, Pasquali, Dario, Sgorbissa, Antonio, Rea, Francesco, Recchiuto, Carmine
The fundamental role of personality in shaping interactions is increasingly being exploited in robotics. A carefully designed robotic personality has been shown to improve several key aspects of Human-Robot Interaction (HRI). However, the fragmentation and rigidity of existing approaches reveal even greater challenges when applied to non-humanoid robots. On one hand, the state of the art is very dispersed; on the other hand, Industry 4.0 is moving towards a future where humans and industrial robots are going to coexist. In this context, the proper design of a robotic personality can lead to more successful interactions. This research takes a first step in that direction by integrating a comprehensive cognitive architecture built upon the definition of robotic personality - validated on humanoid robots - into a robotic Kinova Jaco2 arm. The robot personality is defined through the cognitive architecture as a vector in the three-dimensional space encompassing Conscientiousness, Extroversion, and Agreeableness, affecting how actions are executed, the action selection process, and the internal reaction to environmental stimuli. Our main objective is to determine whether users perceive distinct personalities in the robot, regardless of its shape, and to understand the role language plays in shaping these perceptions. To achieve this, we conducted a user study comprising 144 sessions of a collaborative game between a Kinova Jaco2 arm and participants, where the robot's behavior was influenced by its assigned personality. Furthermore, we compared two conditions: in the first, the robot communicated solely through gestures and action choices, while in the second, it also utilized verbal interaction.
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
P, Jishnu Jaykumar, Salvato, Cole, Bomnale, Vinaya, Wang, Jikai, Xiang, Yu
We introduce iTeach, a Mixed Reality (MR) framework to improve robot perception through real-time interactive teaching. By allowing human instructors to dynamically label robot RGB data, iTeach improves both the accuracy and adaptability of robot perception to new scenarios. The framework supports on-the-fly data collection and labeling, enhancing model performance, and generalization. Applied to door and handle detection for household tasks, iTeach integrates a HoloLens app with an interactive YOLO model. Furthermore, we introduce the IRVLUTD DoorHandle dataset. DH-YOLO, our efficient detection model, significantly enhances the accuracy and efficiency of door and handle detection, highlighting the potential of MR to make robotic systems more capable and adaptive in real-world environments. The project page is available at https://irvlutd.github.io/iTeach.
CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation
Jiang, Chen, Yang, Yuchen, Jagersand, Martin
The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP's strong vision-language representations to segment regions from referring expressions, while utilizing its ``U-shaped'' encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120% and can successfully assist real-world UIBVS control in an unstructured manipulation environment.
Sounds of action: Using ears, not just eyes, improves robot perception
People rarely use just one sense to understand the world, but robots usually only rely on vision and, increasingly, touch. Carnegie Mellon University researchers find that robot perception could improve markedly by adding another sense: hearing. In what they say is the first large-scale study of the interactions between sound and robotic action, researchers at CMU's Robotics Institute found that sounds could help a robot differentiate between objects, such as a metal screwdriver and a metal wrench. Hearing also could help robots determine what type of action caused a sound and help them use sounds to predict the physical properties of new objects. "A lot of preliminary work in other fields indicated that sound could be useful, but it wasn't clear how useful it would be in robotics," said Lerrel Pinto, who recently earned his Ph.D. in robotics at CMU and will join the faculty of New York University this fall.
New Algorithm Improves Robot Perception
Robot perception relies on sensors that often provide noisy or misleading inputs. MIT's GNC algorithm allows the robot to decide which data points to trust and which to discard. One application of the GNC algorithm is called shape alignment. A robot estimates the 3D location and orientation of a car using 2D camera images. The robot receives a camera image with many points labeled by a feature-detection algorithm: headlights, wheels, mirrors.
Deep Science: Robot perception, acoustic monitoring, using ML to detect arthritis – TechCrunch
Research papers come out far too rapidly for anyone to read them all, especially in the field of machine learning, which now affects (and produces papers in) practically every industry and company. This column aims to collect the most relevant recent discoveries and papers -- particularly in but not limited to artificial intelligence -- and explain why they matter. The topics in this week's Deep Science column are a real grab bag that range from planetary science to whale tracking. There are also some interesting insights from tracking how social media is used and some work that attempts to shift computer vision systems closer to human perception (good luck with that). One of machine learning's most reliable use cases is training a model on a target pattern, say a particular shape or radio signal, and setting it loose on a huge body of noisy data to find possible hits that humans might struggle to perceive.
Sounds of action: Using ears, not just eyes, improves robot perception: Carnegie Mellon builds dataset capturing interaction of sound, action, vision
In what they say is the first large-scale study of the interactions between sound and robotic action, researchers at CMU's Robotics Institute found that sounds could help a robot differentiate between objects, such as a metal screwdriver and a metal wrench. Hearing also could help robots determine what type of action caused a sound and help them use sounds to predict the physical properties of new objects. "A lot of preliminary work in other fields indicated that sound could be useful, but it wasn't clear how useful it would be in robotics," said Lerrel Pinto, who recently earned his Ph.D. in robotics at CMU and will join the faculty of New York University this fall. He and his colleagues found the performance rate was quite high, with robots that used sound successfully classifying objects 76 percent of the time. The results were so encouraging, he added, that it might prove useful to equip future robots with instrumented canes, enabling them to tap on objects they want to identify.
Personalized Machine Learning for Robot Perception of Affect and Engagement in Autism Therapy
Rudovic, Ognjen, Lee, Jaeryoung, Dai, Miles, Schuller, Bjorn, Picard, Rosalind
Robots have great potential to facilitate future therapies for children on the autism spectrum. However, existing robots lack the ability to automatically perceive and respond to human affect, which is necessary for establishing and maintaining engaging interactions. Moreover, their inference challenge is made harder by the fact that many individuals with autism have atypical and unusually diverse styles of expressing their affective-cognitive states. To tackle the heterogeneity in behavioral cues of children with autism, we use the latest advances in deep learning to formulate a personalized machine learning (ML) framework for automatic perception of the childrens affective states and engagement during robot-assisted autism therapy. The key to our approach is a novel shift from the traditional ML paradigm - instead of using 'one-size-fits-all' ML models, our personalized ML framework is optimized for each child by leveraging relevant contextual information (demographics and behavioral assessment scores) and individual characteristics of each child. We designed and evaluated this framework using a dataset of multi-modal audio, video and autonomic physiology data of 35 children with autism (age 3-13) and from 2 cultures (Asia and Europe), participating in a 25-minute child-robot interaction (~500k datapoints). Our experiments confirm the feasibility of the robot perception of affect and engagement, showing clear improvements due to the model personalization. The proposed approach has potential to improve existing therapies for autism by offering more efficient monitoring and summarization of the therapy progress.
Robot Perception of Human Groups in the Real World: State of the Art
Taylor, Angelique (University of California San Diego) | Riek, Laurel D. (University of California San Diego)
As robots enter human spaces and begin to work proximately with people, it is important that they understand human social interaction. They must be able to perceive human social signals and understand how to adapt to groups. The goal of our work is to design robot perception algorithms that allow robots to understand human group dynamics via social cues, and understand how to behave collaboratively in groups. In this paper, we discuss the current state-of-the-art of two fields that have contributed methods to achieve this goal, social signal processing and computer vision. We describe recent advances in these fields, as well as some of the challenges faced when adapting them to mobile robots.
Occupancy Grids: A Stochastic Spatial Representation for Active Robot Perception
In this paper we provide an overview of a new framework for robot perception, real-world modelling, and navigation that uses a stochastic tesselated representation of spatial information called the Occupancy Grid. The Occupancy Grid is a multi-dimensional random field model that maintains probabilistic estimates of the occupancy state of each cell in a spatial lattice. Bayesian estimation mechanisms employing stochastic sensor models allow incremental updating of the Occupancy Grid using multi-view, multi-sensor data, composition of multiple maps, decision-making, and incorporation of robot and sensor position uncertainty. We present the underlying stochastic formulation of the Occupancy Grid framework, and discuss its application to a variety of robotic tusks. These include range-based mapping, multi-sensor integration, path-planning and obstacle avoidance, handling of robot position uncertainty, incorporation of pre-compiled maps, recovery of geometric representations, and other related problems. The experimental results show that the Occupancy Grid approach generates dense world models, is robust under sensor uncertainty and errors, and allows explicit handling of uncertainty. It supports the development of robust and agile sensor interpretation methods, incremental discovery procedures, and composition of information from multiple sources. Furthermore, the results illustrate that robotic tasks can be addressed through operations performed di- rectly on the Occupancy Grid, and that these operations have strong parallels to operations performed in the image processing domain.