Ji, Tianchen
Learning Coordinated Bimanual Manipulation Policies using State Diffusion and Inverse Dynamics Models
Chen, Haonan, Xu, Jiaming, Sheng, Lily, Ji, Tianchen, Liu, Shuijing, Li, Yunzhu, Driggs-Campbell, Katherine
-- When performing tasks like laundry, humans naturally coordinate both hands to manipulate objects and anticipate how their actions will change the state of the clothes. However, achieving such coordination in robotics remains challenging due to the need to model object movement, predict future states, and generate precise bimanual actions. In this work, we address these challenges by infusing the predictive nature of human manipulation strategies into robot imitation learning. Specifically, we disentangle task-related state transitions from agent-specific inverse dynamics modeling to enable effective bimanual coordination. Using a demonstration dataset, we train a diffusion model to predict future states given historical observations, envisioning how the scene evolves. Then, we use an inverse dynamics model to compute robot actions that achieve the predicted states. Our key insight is that modeling object movement can help learning policies for bimanual coordination manipulation tasks. Evaluating our framework across diverse simulation and real-world manipulation setups, including multimodal goal configurations, bimanual manipulation, deformable objects, and multi-object setups, we find that it consistently outperforms state-of-the-art state-to-action mapping policies. Our method demonstrates a remarkable capacity to navigate multimodal goal configurations and action distributions, maintain stability across different control modes, and synthesize a broader range of behaviors than those present in the demonstration dataset. Many everyday bimanual manipulation tasks, such as cooking or sorting laundry, are simple for humans but remain challenging for robots. Humans naturally anticipate how their actions will influence object states, using predictive reasoning to guide movements [1], [2]. Unlike single-arm tasks, which primarily involve independent end-effectors, bimanual tasks demand cooperative force distribution, complex spatial planning, and interaction-aware control, making it difficult for robots to achieve stability and precision, especially in tasks involving deformable or multiple objects. Despite recent advances in robotic manipulation [3]-[6], bimanual coordination remains an open challenge due to the intricate interplay between robot actions and object dynamics.
An Expert Ensemble for Detecting Anomalous Scenes, Interactions, and Behaviors in Autonomous Driving
Ji, Tianchen, Chakraborty, Neeloy, Schreiber, Andre, Driggs-Campbell, Katherine
Autonomous driving is at a critical stage in revolutionizing transportation systems and reshaping societal norms. More than 1,400 self-driving cars, trucks, and other vehicles are currently in operation or testing in the U.S. (Etherington 2019), and 4.5 million autonomous vehicles are expected to run on U.S. roads by 2030 (Meyer 2023). While autonomous driving is promising in improving traffic efficiency and personal mobility, safety is a prerequisite of all possible achievements and is becoming the first priority in practice (Du et al. 2020). In October 2023, Cruise, one of the leading autonomous driving companies, was ordered by California to stop operations of driverless cars in the state after one of Cruise's cars struck a pedestrian in San Francisco (Kerr 2023). The rare incident involved a woman who was first hit by a human driver and then thrown onto the road in front of a Cruise vehicle. The Cruise vehicle then rolled over the pedestrian and finally stopped on top of her, causing serious injuries. Such an accident reflects one of the greatest challenges in autonomous driving: the safety of an autonomous car is largely determined by the ability to detect and react to rare scenarios rather than common normal situations, which have been well considered during development. Although rare in a long-tailed distribution, unusual driving scenarios do happen and can have large impact on driving safety. To mitigate the impact of abnormal ego behaviors when outside the design domains, a detection system for anomalous driving scenarios is necessary, the output of which can be potentially used as a high-level decision for motion planning.
Interaction-aware Conformal Prediction for Crowd Navigation
Huang, Zhe, Ji, Tianchen, Zhang, Heling, Pouria, Fatemeh Cheraghi, Driggs-Campbell, Katherine, Dong, Roy
During crowd navigation, robot motion plan needs to consider human motion uncertainty, and the human motion uncertainty is dependent on the robot motion plan. We introduce Interaction-aware Conformal Prediction (ICP) to alternate uncertainty-aware robot motion planning and decision-dependent human motion uncertainty quantification. ICP is composed of a trajectory predictor to predict human trajectories, a model predictive controller to plan robot motion with confidence interval radii added for probabilistic safety, a human simulator to collect human trajectory calibration dataset conditioned on the planned robot motion, and a conformal prediction module to quantify trajectory prediction error on the decision-dependent calibration dataset. Crowd navigation simulation experiments show that ICP strikes a good balance of performance among navigation efficiency, social awareness, and uncertainty quantification compared to previous works. ICP generalizes well to navigation tasks under various crowd densities. The fast runtime and efficient memory usage make ICP practical for real-world applications. Code is available at https://github.com/tedhuang96/icp.
A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled Robots
Chang, Peixin, Liu, Shuijing, Ji, Tianchen, Chakraborty, Neeloy, Hong, Kaiwen, Driggs-Campbell, Katherine
A command-following robot that serves people in everyday life must continually improve itself in deployment domains with minimal help from its end users, instead of engineers. Previous methods are either difficult to continuously improve after the deployment or require a large number of new labels during fine-tuning. Motivated by (self-)supervised contrastive learning, we propose a novel representation that generates an intrinsic reward function for command-following robot tasks by associating images with sound commands. After the robot is deployed in a new domain, the representation can be updated intuitively and data-efficiently by non-experts without any hand-crafted reward functions. We demonstrate our approach on various sound types and robotic tasks, including navigation and manipulation with raw sensor inputs. In simulated and real-world experiments, we show that our system can continually self-improve in previously unseen scenarios given fewer new labeled data, while still achieving better performance over previous methods.
An Attentional Recurrent Neural Network for Occlusion-Aware Proactive Anomaly Detection in Field Robot Navigation
Schreiber, Andre, Ji, Tianchen, McPherson, D. Livingston, Driggs-Campbell, Katherine
The use of mobile robots in unstructured environments like the agricultural field is becoming increasingly common. The ability for such field robots to proactively identify and avoid failures is thus crucial for ensuring efficiency and avoiding damage. However, the cluttered field environment introduces various sources of noise (such as sensor occlusions) that make proactive anomaly detection difficult. Existing approaches can show poor performance in sensor occlusion scenarios as they typically do not explicitly model occlusions and only leverage current sensory inputs. In this work, we present an attention-based recurrent neural network architecture for proactive anomaly detection that fuses current sensory inputs and planned control actions with a latent representation of prior robot state. We enhance our model with an explicitly-learned model of sensor occlusion that is used to modulate the use of our latent representation of prior robot state. Our method shows improved anomaly detection performance and enables mobile field robots to display increased resilience to predicting false positives regarding navigation failure during periods of sensor occlusion, particularly in cases where all sensors are briefly occluded. Our code is available at: https://github.com/andreschreiber/roar
Structural Attention-Based Recurrent Variational Autoencoder for Highway Vehicle Anomaly Detection
Chakraborty, Neeloy, Hasan, Aamir, Liu, Shuijing, Ji, Tianchen, Liang, Weihang, McPherson, D. Livingston, Driggs-Campbell, Katherine
In autonomous driving, detection of abnormal driving behaviors is essential to ensure the safety of vehicle controllers. Prior works in vehicle anomaly detection have shown that modeling interactions between agents improves detection accuracy, but certain abnormal behaviors where structured road information is paramount are poorly identified, such as wrong-way and off-road driving. We propose a novel unsupervised framework for highway anomaly detection named Structural Attention-Based Recurrent VAE (SABeR-VAE), which explicitly uses the structure of the environment to aid anomaly identification. Specifically, we use a vehicle self-attention module to learn the relations among vehicles on a road, and a separate lane-vehicle attention module to model the importance of permissible lanes to aid in trajectory prediction. Conditioned on the attention modules' outputs, a recurrent encoder-decoder architecture with a stochastic Koopman operator-propagated latent space predicts the next states of vehicles. Our model is trained end-to-end to minimize prediction loss on normal vehicle behaviors, and is deployed to detect anomalies in (ab)normal scenarios. By combining the heterogeneous vehicle and lane information, SABeR-VAE and its deterministic variant, SABeR-AE, improve abnormal AUPR by 18% and 25% respectively on the simulated MAAD highway dataset over STGAE-KDE. Furthermore, we show that the learned Koopman operator in SABeR-VAE enforces interpretable structure in the variational latent space. The results of our method indeed show that modeling environmental factors is essential to detecting a diverse set of anomalies in deployment. For code implementation, please visit https://sites.google.com/illinois.edu/saber-vae.
Multi-Modal Anomaly Detection for Unstructured and Uncertain Environments
Ji, Tianchen, Vuppala, Sri Theja, Chowdhary, Girish, Driggs-Campbell, Katherine
To achieve high-levels of autonomy, modern robots require the ability to detect and recover from anomalies and failures with minimal human supervision. Multi-modal sensor signals could provide more information for such anomaly detection tasks; however, the fusion of high-dimensional and heterogeneous sensor modalities remains a challenging problem. We propose a deep learning neural network: supervised variational autoencoder (SVAE), for failure identification in unstructured and uncertain environments. Our model leverages the representational power of VAE to extract robust features from high-dimensional inputs for supervised learning tasks. The training objective unifies the generative model and the discriminative model, thus making the learning a one-stage procedure. Our experiments on real field robot data demonstrate superior failure identification performance than baseline methods, and that our model learns interpretable representations. Videos of our results are available on our website: https://sites.google.com/illinois.edu/supervised-vae .