Goto

Collaborating Authors

 Jiang, Helen


Cable Routing and Assembly using Tactile-driven Motion Primitives

arXiv.org Artificial Intelligence

Manipulating cables is challenging for robots because of the infinite degrees of freedom of the cables and frequent occlusion by the gripper and the environment. These challenges are further complicated by the dexterous nature of the operations required for cable routing and assembly, such as weaving and inserting, hampering common solutions with vision-only sensing. In this paper, we propose to integrate tactile-guided low-level motion control with high-level vision-based task parsing for a challenging task: cable routing and assembly on a reconfigurable task board. Specifically, we build a library of tactile-guided motion primitives using a fingertip GelSight sensor, where each primitive reliably accomplishes an operation such as cable following and weaving. The overall task is inferred via visual perception given a goal configuration image, and then used to generate the primitive sequence. Experiments demonstrate the effectiveness of individual tactile-guided primitives and the integrated end-to-end solution, significantly outperforming the method without tactile sensing. Our reconfigurable task setup and proposed baselines provide a benchmark for future research in cable manipulation. More details and video are presented in \url{https://helennn.github.io/cable-manip/}


On Two XAI Cultures: A Case Study of Non-technical Explanations in Deployed AI System

arXiv.org Artificial Intelligence

Explainable AI (XAI) research has been booming, but the question "$\textbf{To whom}$ are we making AI explainable?" is yet to gain sufficient attention. Not much of XAI is comprehensible to non-AI experts, who nonetheless, are the primary audience and major stakeholders of deployed AI systems in practice. The gap is glaring: what is considered "explained" to AI-experts versus non-experts are very different in practical scenarios. Hence, this gap produced two distinct cultures of expectations, goals, and forms of XAI in real-life AI deployments. We advocate that it is critical to develop XAI methods for non-technical audiences. We then present a real-life case study, where AI experts provided non-technical explanations of AI decisions to non-technical stakeholders, and completed a successful deployment in a highly regulated industry. We then synthesize lessons learned from the case, and share a list of suggestions for AI experts to consider when explaining AI decisions to non-technical stakeholders.


Semantic Curiosity for Active Visual Learning

arXiv.org Artificial Intelligence

In this paper, we study the task of embodied interactive learning for object detection. Given a set of environments (and some labeling budget), our goal is to learn an object detector by having an agent select what data to obtain labels for. How should an exploration policy decide which trajectory should be labeled? One possibility is to use a trained object detector's failure cases as an external reward. However, this will require labeling millions of frames required for training RL policies, which is infeasible. Instead, we explore a self-supervised approach for training our exploration policy by introducing a notion of semantic curiosity. Our semantic curiosity policy is based on a simple observation -- the detection outputs should be consistent. Therefore, our semantic curiosity rewards trajectories with inconsistent labeling behavior and encourages the exploration policy to explore such areas. The exploration policy trained via semantic curiosity generalizes to novel scenes and helps train an object detector that outperforms baselines trained with other possible alternatives such as random exploration, prediction-error curiosity, and coverage-maximizing exploration.


Canonical Correlation Inference for Mapping Abstract Scenes to Text

AAAI Conferences

We describe a technique for structured prediction, based on canonical correlation analysis. Our learning algorithm finds two projections for the input and the output spaces that aim at projecting a given input and its correct output into points close to each other. We demonstrate our technique on a language-vision problem, namely the problem of giving a textual description to an "abstract scene".