Ju, Yuanchen
GRACE: Generalizing Robot-Assisted Caregiving with User Functionality Embeddings
Liu, Ziang, Ju, Yuanchen, Da, Yu, Silver, Tom, Thakkar, Pranav N., Li, Jenna, Guo, Justin, Dimitropoulou, Katherine, Bhattacharjee, Tapomayukh
Robot caregiving should be personalized to meet the diverse needs of care recipients -- assisting with tasks as needed, while taking user agency in action into account. In physical tasks such as handover, bathing, dressing, and rehabilitation, a key aspect of this diversity is the functional range of motion (fROM), which can vary significantly between individuals. In this work, we learn to predict personalized fROM as a way to generalize robot decision-making in a wide range of caregiving tasks. We propose a novel data-driven method for predicting personalized fROM using functional assessment scores from occupational therapy. We develop a neural model that learns to embed functional assessment scores into a latent representation of the user's physical function. The model is trained using motion capture data collected from users with emulated mobility limitations. After training, the model predicts personalized fROM for new users without motion capture. Through simulated experiments and a real-robot user study, we show that the personalized fROM predictions from our model enable the robot to provide personalized and effective assistance while improving the user's agency in action. See our website for more visualizations: https://emprise.cs.cornell.edu/grace/.
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo
Zhu, Junzhe, Ju, Yuanchen, Zhang, Junyi, Wang, Muhan, Yuan, Zhecheng, Hu, Kaizhe, Xu, Huazhe
Circles represent the contact points in the human demo / grasping points for robot manipulation. Dense 3D correspondence can enhance robotic manipulation by enabling the generalization of spatial, functional, and dynamic information from one object to an unseen counterpart. Compared to shape correspondence, semantic correspondence is more effective in generalizing across different object categories. DenseMatcher first computes vertex features by projecting multiview 2D features onto meshes and refining them with a 3D network, and subsequently finds dense correspondences with the obtained features using functional map. In addition, we craft the first 3D matching dataset that contains colored object meshes across diverse categories. In our experiments, we show that DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%. We demonstrate the downstream effectiveness of DenseMatcher in (i) robotic manipulation, where it achieves crossinstance and cross-category generalization on long-horizon complex manipulation tasks from observing only one demo; (ii) zero-shot color mapping between digital assets, where appearance can be transferred between different objects with relatable geometry. Correspondence plays a pivotal role in robotics Wang (2019). For instance, in robotic assembly, it is necessary to determine the corresponding parts between the target and source objects.
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Ju, Yuanchen, Hu, Kaizhe, Zhang, Guowei, Zhang, Gu, Jiang, Mingrun, Xu, Huazhe
Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step toward open-world embodied intelligence. For human beings, this ability is rooted in the understanding of semantic correspondence among objects, which naturally transfers the interaction experience of familiar objects to novel ones. Although robots lack such a reservoir of interaction experience, the vast availability of human videos on the Internet may serve as a valuable resource, from which we extract an affordance memory including the contact points. Inspired by the natural way humans think, we propose Robo-ABC: when confronted with unfamiliar objects that require generalization, the robot can acquire affordance by retrieving objects that share visual or semantic similarities from the affordance memory. The next step is to map the contact points of the retrieved objects to the new object. While establishing this correspondence may present formidable challenges at first glance, recent research finds it naturally arises from pre-trained diffusion models, enabling affordance mapping even across disparate object categories. Through the Robo-ABC framework, robots may generalize to manipulate out-of-category objects in a zero-shot manner without any manual annotation, additional training, part segmentation, pre-coded knowledge, or viewpoint restrictions. Quantitatively, Robo-ABC significantly enhances the accuracy of visual affordance retrieval by a large margin of 31.6% compared to state-of-the-art (SOTA) end-to-end affordance models. We also conduct real-world experiments of cross-category object-grasping tasks. Robo-ABC achieved a success rate of 85.7%, proving its capacity for real-world tasks.
IRRGN: An Implicit Relational Reasoning Graph Network for Multi-turn Response Selection
Deng, Jingcheng, Dai, Hengwei, Guo, Xuewei, Ju, Yuanchen, Peng, Wei
The task of response selection in multi-turn dialogue is to find the best option from all candidates. In order to improve the reasoning ability of the model, previous studies pay more attention to using explicit algorithms to model the dependencies between utterances, which are deterministic, limited and inflexible. In addition, few studies consider differences between the options before and after reasoning. In this paper, we propose an Implicit Relational Reasoning Graph Network to address these issues, which consists of the Utterance Relational Reasoner (URR) and the Option Dual Comparator (ODC). URR aims to implicitly extract dependencies between utterances, as well as utterances and options, and make reasoning with relational graph convolutional networks. ODC focuses on perceiving the difference between the options through dual comparison, which can eliminate the interference of the noise options. Experimental results on two multi-turn dialogue reasoning benchmark datasets MuTual and MuTual+ show that our method significantly improves the baseline of four pretrained language models and achieves state-of-the-art performance. The model surpasses human performance for the first time on the MuTual dataset.
ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch
Xue, Zhengrong, Zhang, Han, Cheng, Jingwen, He, Zhengmao, Ju, Yuanchen, Lin, Changyi, Zhang, Gu, Xu, Huazhe
The notion of robotic manipulation [1, 2] easily invokes the image of a biomimetic robot arm or hand trying to grasp tabletop objects and then rearrange them into desired configurations inferred by exteroceptive sensors such as RGBD cameras. To facilitate this manipulation pipeline, the robot learning community has made tremendous efforts in either how to determine steadier grasping poses in demanding scenarios [3, 4, 5, 6, 7] or how to understand the exteroceptive inputs in a more robust and generalizable way [8, 9, 10, 11, 12, 13]. Acknowledging these progresses, this paper attempts to bypass the challenges in the prevailing pipeline by advocating ArrayBot, a reinforcement learning driven system for distributed manipulation [14], where the objects are manipulated through a great number of actuators with only proprioceptive tactile sensing [15, 16, 17, 18]. Conceptually, the hardware of ArrayBot is a 16 16 array of vertically sliding pillars, each of which can be independently actuated, leading to a 16 16 action space. Functionally, the actuators beneath a tabletop object can support its weight and at the same time cooperate to lift, tilt, and even translate it through proper motion policies.