Jamali, Nawid
ViHOPE: Visuotactile In-Hand Object 6D Pose Estimation with Shape Completion
Li, Hongyu, Dikhale, Snehal, Iba, Soshi, Jamali, Nawid
In this letter, we introduce ViHOPE, a novel framework for estimating the 6D pose of an in-hand object using visuotactile perception. Our key insight is that the accuracy of the 6D object pose estimate can be improved by explicitly completing the shape of the object. To this end, we introduce a novel visuotactile shape completion module that uses a conditional Generative Adversarial Network to complete the shape of an in-hand object based on volumetric representation. This approach improves over prior works that directly regress visuotactile observations to a 6D pose. By explicitly completing the shape of the in-hand object and jointly optimizing the shape completion and pose estimation tasks, we improve the accuracy of the 6D object pose estimate. We train and test our model on a synthetic dataset and compare it with the state-of-the-art. In the visuotactile shape completion task, we outperform the state-of-the-art by 265% using the Intersection of Union metric and achieve 88% lower Chamfer Distance. In the visuotactile pose estimation task, we present results that suggest our framework reduces position and angular errors by 35% and 64%, respectively. Furthermore, we ablate our framework to confirm the gain on the 6D object pose estimate from explicitly completing the shape. Ultimately, we show that our framework produces models that are robust to sim-to-real transfer on a real-world robot platform.
"How Did They Come Across?" Lessons Learned from Continuous Affective Ratings
Parreira, Maria Teresa, Sack, Michael J., Javed, Hifza, Jamali, Nawid, Jung, Malte
Social distance, or perception of the other, is recognized as a dynamic dimension of an interaction, but yet to be widely explored or understood. Through CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, we collected retrospective ratings of interpersonal perceptions between 12 participant dyads. In this work, we explore how different aspects of these interactions reflect on the ratings collected, through a discourse analysis of individual and social behavior of the interactants. We found that different events observed in the ratings can be mapped to complex interaction phenomena, shedding light on relevant interaction features that may play a role in interpersonal understanding and grounding. This paves the way for better, more seamless human-robot interactions, where affect is interpreted as highly dynamic and contingent on interaction history.
Group Dynamics: Survey of Existing Multimodal Models and Considerations for Social Mediation
Javed, Hifza, Jamali, Nawid
Social mediator robots facilitate human-human interactions by producing behavior strategies that positively influence how humans interact with each other in social settings. As robots for social mediation gain traction in the field of human-human-robot interaction, their ability to "understand" the humans in their environments becomes crucial. This objective requires models of human understanding that consider multiple humans in an interaction as a collective entity and represent the group dynamics that exist among its members. Group dynamics are defined as the influential actions, processes, and changes that occur within and between group interactants. Since an individual's behavior may be deeply influenced by their interactions with other group members, the social dynamics existing within a group can influence the behaviors, attitudes, and opinions of each individual and the group as a whole. Therefore, models of group dynamics are critical for a social mediator robot to be effective in its role. In this paper, we survey existing models of group dynamics and categorize them into models of social dominance, affect, social cohesion, conflict resolution, and engagement. We highlight the multimodal features these models utilize, and emphasize the importance of capturing the interpersonal aspects of a social interaction. Finally, we make a case for models of relational affect as an approach that may be able to capture a representation of human-human interactions that can be useful for social mediation.
What Could a Social Mediator Robot Do? Lessons from Real-World Mediation Scenarios
Weisswange, Thomas H., Javed, Hifza, Dietrich, Manuel, Pham, Tuan Vu, Parreira, Maria Teresa, Sack, Michael, Jamali, Nawid
The use of social robots as instruments for social mediation has been gaining traction in the field of Human-Robot Interaction (HRI). So far, the design of such robots and their behaviors is often driven by technological platforms and experimental setups in controlled laboratory environments. To address complex social relationships in the real world, it is crucial to consider the actual needs and consequences of the situations found therein. This includes understanding when a mediator is necessary, what specific role such a robot could play, and how it moderates human social dynamics. In this paper, we discuss six relevant roles for robotic mediators that we identified by investigating a collection of videos showing realistic group situations. We further discuss mediation behaviors and target measures to evaluate the success of such interventions. We hope that our findings can inspire future research on robot-assisted social mediation by highlighting a wider set of mediation applications than those found in prior studies. Specifically, we aim to inform the categorization and selection of interaction scenarios that reflect real situations, where a mediation robot can have a positive and meaningful impact on group dynamics.
Modeling Group Dynamics for Personalized Robot-Mediated Interactions
Javed, Hifza, Jamali, Nawid
The field of human-human-robot interaction (HHRI) uses social robots to positively influence how humans interact with each other. This objective requires models of human understanding that consider multiple humans in an interaction as a collective entity and represent the group dynamics that exist within it. Understanding group dynamics is important because these can influence the behaviors, attitudes, and opinions of each individual within the group, as well as the group as a whole. Such an understanding is also useful when personalizing an interaction between a robot and the humans in its environment, where a group-level model can facilitate the design of robot behaviors that are tailored to a given group, the dynamics that exist within it, and the specific needs and preferences of the individual interactants. In this paper, we highlight the need for group-level models of human understanding in human-human-robot interaction research and how these can be useful in developing personalization techniques. We survey existing models of group dynamics and categorize them into models of social dominance, affect, social cohesion, and conflict resolution. We highlight the important features these models utilize, evaluate their potential to capture interpersonal aspects of a social interaction, and highlight their value for personalization techniques. Finally, we identify directions for future work, and make a case for models of relational affect as an approach that can better capture group-level understanding of human-human interactions and be useful in personalizing human-human-robot interactions.
Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects
Rezazadeh, Alireza, Dikhale, Snehal, Iba, Soshi, Jamali, Nawid
Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.
VisuoSpatial Foresight for Physical Sequential Fabric Manipulation
Hoque, Ryan, Seita, Daniel, Balakrishna, Ashwin, Ganapathi, Aditya, Tanwani, Ajay Kumar, Jamali, Nawid, Yamane, Katsu, Iba, Soshi, Goldberg, Ken
Robotic fabric manipulation has applications in home robotics, textiles, senior care and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We build upon the Visual Foresight framework to learn fabric dynamics that can be efficiently reused to accomplish different sequential fabric manipulation tasks with a single goal-conditioned policy. We extend our earlier work on VisuoSpatial Foresight (VSF), which learns visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. In this earlier work, we evaluated VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. A key finding was that depth sensing significantly improves performance: RGBD data yields an 80% improvement in fabric folding success rate in simulation over pure RGB data. In this work, we vary 4 components of VSF, including data generation, the choice of visual dynamics model, cost function, and optimization procedure. Results suggest that training visual dynamics models using longer, corner-based actions can improve the efficiency of fabric folding by 76% and enable a physical sequential fabric folding task that VSF could not previously perform with 90% reliability. Code, data, videos, and supplementary material are available at https://sites.google.com/view/fabric-vsf/.
Deep Imitation Learning of Sequential Fabric Smoothing Policies
Seita, Daniel, Ganapathi, Aditya, Hoque, Ryan, Hwang, Minho, Cen, Edward, Tanwani, Ajay Kumar, Balakrishna, Ashwin, Thananjeyan, Brijen, Ichnowski, Jeffrey, Jamali, Nawid, Yamane, Katsu, Iba, Soshi, Canny, John, Goldberg, Ken
Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color or depth images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic demonstrator that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of color vs. depth images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 120 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, policies trained in simulation attain 86% and 69% final coverage for color and depth inputs, respectively, suggesting the feasibility of learning fabric smoothing policies from simulation. Supplementary material is available at https://sites.google.com/view/ fabric-smoothing.
Robot Bed-Making: Deep Transfer Learning Using Depth Sensing of Deformable Fabric
Seita, Daniel, Jamali, Nawid, Laskey, Michael, Berenstein, Ron, Tanwani, Ajay Kumar, Baskaran, Prakash, Iba, Soshi, Canny, John, Goldberg, Ken
Abstract-- Bed-making is a common task well-suited for home robots since it is tolerant to error and not time-critical. Bed-making can also be difficult for senior citizens and those with limited mobility due to the bending and reaching movements required. Autonomous bed-making combines multiple challenges in robotics: perception in unstructured environments, deformable object manipulation, transfer learning, and sequential decision making. We formalize the bed-making problem as one of maximizing surface coverage with a blanket, and explore algorithmic approaches that use deep learning on depth images to be invariant to the color and pattern of the blankets. We train two networks: one to identify a corner of the blanket and another to determine when to transition to the other side of the bed. Using the first network, the robot grasps at its estimate of the blanket corner and then pulls it to the appropriate corner of the bed frame. The second network estimates if the robot has sufficiently covered one side and can transition to the other, or if it should attempt another grasp from the same side. We evaluate with two robots, the Toyota HSR and the Fetch, and three blankets. Using 2018 and 654 depth images for training the grasp and transition networks respectively, experiments with a quarter-scale twin bed achieve an average of 91.7% blanket coverage, nearly matching human supervisors with 95.0% coverage. Data is available at https: //sites.google.com/view/bed-make. A common home task is bed-making [4], which is rarely enjoyed and can be physically challenging due to bending and leaning movements. Surveys of older adults in the United States [9], [3], suggest that they are willing to have a robot assistant in their homes, particularly for physically demanding tasks.