Goto

Collaborating Authors

 Padir, Taskin


Analysis and Perspectives on the ANA Avatar XPRIZE Competition

arXiv.org Artificial Intelligence

The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective and subjective scoring metrics. This paper presents a unified summary and analysis of the competition from technical, judging, and organizational perspectives. We study the use of telerobotics technologies and innovations pursued by the competing teams in their avatar systems, and correlate the use of these technologies with judges' task performance and subjective survey ratings. It also summarizes perspectives from team leads, judges, and organizers about the competition's execution and impact to inform the future development of telerobotics and telepresence.


Shared Affordance-awareness via Augmented Reality for Proactive Assistance in Human-robot Collaboration

arXiv.org Artificial Intelligence

Enabling humans and robots to collaborate effectively requires purposeful communication and an understanding of each other's affordances. Prior work in human-robot collaboration has incorporated knowledge of human affordances, i.e., their action possibilities in the current context, into autonomous robot decision-making. This "affordance awareness" is especially promising for service robots that need to know when and how to assist a person that cannot independently complete a task. However, robots still fall short in performing many common tasks autonomously. In this work-in-progress paper, we propose an augmented reality (AR) framework that bridges the gap in an assistive robot's capabilities by actively engaging with a human through a shared affordance-awareness representation. Leveraging the different perspectives from a human wearing an AR headset and a robot's equipped sensors, we can build a perceptual representation of the shared environment and model regions of respective agent affordances. The AR interface can also allow both agents to communicate affordances with one another, as well as prompt for assistance when attempting to perform an action outside their affordance region. This paper presents the main components of the proposed framework and discusses its potential through a domestic cleaning task experiment.


TRASH: Tandem Rover and Aerial Scrap Harvester

arXiv.org Artificial Intelligence

Addressing the challenge of roadside litter in the United States, which has traditionally relied on costly and ineffective manual cleanup methods, this paper presents an autonomous multi-robot system for highway litter monitoring and collection. Our solution integrates an aerial vehicle to scan and gather data across highway stretches with a terrestrial robot equipped with a Convolutional Neural Network (CNN) for litter detection and mapping. Upon detecting litter, the ground robot navigates to each pinpointed location, re-assesses the vicinity, and employs a "greedy pickup" approach to address potential mapping inaccuracies or litter misplacements. Through simulation studies and real-world robotic trials, this work highlights the potential of our proposed system for highway cleanliness and management in the context of Robotics, Automation, and Artificial Intelligence


Mobile MoCap: Retroreflector Localization On-The-Go

arXiv.org Artificial Intelligence

Motion capture through tracking retroreflectors obtains highly accurate pose estimation, which is frequently used in robotics. Unlike commercial motion capture systems, fiducial marker-based tracking methods, such as AprilTags, can perform relative localization without requiring a static camera setup. However, popular pose estimation methods based on fiducial markers have lower localization accuracy than commercial motion capture systems. We propose Mobile MoCap, a system that utilizes inexpensive near-infrared cameras for accurate relative localization even while in motion. We present a retroreflector feature detector that performs 6-DoF (six degrees-of-freedom) tracking and operates with minimal camera exposure times to reduce motion blur. To evaluate the proposed localization technique while in motion, we mount our Mobile MoCap system, as well as an RGB camera to benchmark against fiducial markers, onto a precision-controlled linear rail and servo. The fiducial marker approach employs AprilTags, which are pervasively used for localization in robotics. We evaluate the two systems at varying distances, marker viewing angles, and relative velocities. Across all experimental conditions, our stereo-based Mobile MoCap system obtains higher position and orientation accuracy than the fiducial approach. The code for Mobile MoCap is implemented in ROS 2 and made publicly available at https://github.com/RIVeR-Lab/mobile_mocap.


StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

arXiv.org Artificial Intelligence

Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that employs a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a mobile robot which is used to fine-tune our model. The dataset, the code, and further details including additional visualizations are available at https://lhy.xyz/stereovoxelnet


End-to-end grasping policies for human-in-the-loop robots via deep reinforcement learning

arXiv.org Artificial Intelligence

State-of-the-art human-in-the-loop robot grasping is hugely suffered by Electromyography (EMG) inference robustness issues. As a workaround, researchers have been looking into integrating EMG with other signals, often in an ad hoc manner. In this paper, we are presenting a method for end-to-end training of a policy for human-in-the-loop robot grasping on real reaching trajectories. For this purpose we use Reinforcement Learning (RL) and Imitation Learning (IL) in DEXTRON (DEXTerity enviRONment), a stochastic simulation environment with real human trajectories that are augmented and selected using a Monte Carlo (MC) simulation method. We also offer a success model which once trained on the expert policy data and the RL policy roll-out transitions, can provide transparency to how the deep policy works and when it is probably going to fail.


Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

arXiv.org Artificial Intelligence

For lower arm amputees, robotic prosthetic hands offer the promise to regain the capability to perform fine object manipulation in activities of daily living. Current control methods based on physiological signals such as EEG and EMG are prone to poor inference outcomes due to motion artifacts, variability of skin electrode junction impedance over time, muscle fatigue, and other factors. Visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, variable shapes of objects depending on view-angle, among other factors. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time. Specifically, results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually. An overall fusion accuracy of 95.3% among 13 labels (compared to a chance level of 7.7%) is achieved, and more detailed analysis indicate that the correct grasp is inferred sufficiently early and with high confidence compared to the top contender, in order to allow successful robot actuation to close the loop.


Development of a Laboratory Kit for Robotics Engineering Education

AAAI Conferences

This paper discusses the development of a sequence of undergraduate courses forming the core curriculum in the Robotics Engineering (RBE) B.S. program at Worcester Polytechnic Institute (WPI). The laboratory robotics kit developed for the junior-level courses is presented in detail. The platform is designed to be modular and cost-effective and it is suitable for laboratory based robotics education. The system is ideal not only for undergraduate coursework but also may be adapted for graduate and undergraduate research as well as for exposing K-12 students to STEM.