Goto

Collaborating Authors

 perception pipeline


CODEI: Resource-Efficient Task-Driven Co-Design of Perception and Decision Making for Mobile Robots Applied to Autonomous Vehicles

Milojevic, Dejan, Zardini, Gioele, Elser, Miriam, Censi, Andrea, Frazzoli, Emilio

arXiv.org Artificial Intelligence

This paper discusses the integration challenges and strategies for designing mobile robots, by focusing on the task-driven, optimal selection of hardware and software to balance safety, efficiency, and minimal usage of resources such as costs, energy, computational requirements, and weight. We emphasize the interplay between perception and motion planning in decision-making by introducing the concept of occupancy queries to quantify the perception requirements for sampling-based motion planners. Sensor and algorithm performance are evaluated using False Negative Rates (FPR) and False Positive Rates (FPR) across various factors such as geometric relationships, object properties, sensor resolution, and environmental conditions. By integrating perception requirements with perception performance, an Integer Linear Programming (ILP) approach is proposed for efficient sensor and algorithm selection and placement. This forms the basis for a co-design optimization that includes the robot body, motion planner, perception pipeline, and computing unit. We refer to this framework for solving the co-design problem of mobile robots as CODEI, short for Co-design of Embodied Intelligence. A case study on developing an Autonomous Vehicle (AV) for urban scenarios provides actionable information for designers, and shows that complex tasks escalate resource demands, with task performance affecting choices of the autonomy stack. The study demonstrates that resource prioritization influences sensor choice: cameras are preferred for cost-effective and lightweight designs, while lidar sensors are chosen for better energy and computational efficiency.


Towards Fluorescence-Guided Autonomous Robotic Partial Nephrectomy on Novel Tissue-Mimicking Hydrogel Phantoms

Kilmer, Ethan, Chen, Joseph, Ge, Jiawei, Sarda, Preksha, Cha, Richard, Cleary, Kevin, Shepard, Lauren, Ghazi, Ahmed Ezzat, Scheikl, Paul Maria, Krieger, Axel

arXiv.org Artificial Intelligence

Autonomous robotic systems hold potential for improving renal tumor resection accuracy and patient outcomes. We present a fluorescence-guided robotic system capable of planning and executing incision paths around exophytic renal tumors with a clinically relevant resection margin. Leveraging point cloud observations, the system handles irregular tumor shapes and distinguishes healthy from tumorous tissue based on near-infrared imaging, akin to indocyanine green staining in partial nephrectomy. Tissue-mimicking phantoms are crucial for the development of autonomous robotic surgical systems for interventions where acquiring ex-vivo animal tissue is infeasible, such as cancer of the kidney and renal pelvis. To this end, we propose novel hydrogel-based kidney phantoms with exophytic tumors that mimic the physical and visual behavior of tissue, and are compatible with electrosurgical instruments, a common limitation of silicone-based phantoms. In contrast to previous hydrogel phantoms, we mix the material with near-infrared dye to enable fluorescence-guided tumor segmentation. Autonomous real-world robotic experiments validate our system and phantoms, achieving an average margin accuracy of 1.44 mm in a completion time of 69 sec.


An Event-Based Perception Pipeline for a Table Tennis Robot

Ziegler, Andreas, Gossard, Thomas, Glover, Arren, Zell, Andreas

arXiv.org Artificial Intelligence

Table tennis robots gained traction over the last years and have become a popular research challenge for control and perception algorithms. Fast and accurate ball detection is crucial for enabling a robotic arm to rally the ball back successfully. So far, most table tennis robots use conventional, frame-based cameras for the perception pipeline. However, frame-based cameras suffer from motion blur if the frame rate is not high enough for fast-moving objects. Event-based cameras, on the other hand, do not have this drawback since pixels report changes in intensity asynchronously and independently, leading to an event stream with a temporal resolution on the order of us. To the best of our knowledge, we present the first real-time perception pipeline for a table tennis robot that uses only event-based cameras. We show that compared to a frame-based pipeline, event-based perception pipelines have an update rate which is an order of magnitude higher. This is beneficial for the estimation and prediction of the ball's position, velocity, and spin, resulting in lower mean errors and uncertainties. These improvements are an advantage for the robot control, which has to be fast, given the short time a table tennis ball is flying until the robot has to hit back.


RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing

Mortazavi, Mehrad, Cappelleri, David J., Ehsani, Reza

arXiv.org Artificial Intelligence

Driven by the need to address labor shortages and meet the demands of a rapidly growing population, robotic automation has become a critical component in precision agriculture. Leaf-level hyperspectral spectroscopy is shown to be a powerful tool for phenotyping, monitoring crop health, identifying essential nutrients within plants as well as detecting diseases and water stress. This work introduces RoMu4o, a robotic manipulation unit for orchard operations offering an automated solution for proximal hyperspectral leaf sensing. This ground robot is equipped with a 6DOF robotic arm and vision system for real-time deep learning-based image processing and motion planning. We developed robust perception and manipulation pipelines that enable the robot to successfully grasp target leaves and perform spectroscopy. These frameworks operate synergistically to identify and extract the 3D structure of leaves from an observed batch of foliage, propose 6D poses, and generate collision-free constraint-aware paths for precise leaf manipulation. The end-effector of the arm features a compact design that integrates an independent lighting source with a hyperspectral sensor, enabling high-fidelity data acquisition while streamlining the calibration process for accurate measurements. Our ground robot is engineered to operate in unstructured orchard environments. However, the performance of the system is evaluated in both indoor and outdoor plant models. The system demonstrated reliable performance for 1-LPB hyperspectral sampling, achieving 95% success rate in lab trials and 79% in field trials. Field experiments revealed an overall success rate of 70% for autonomous leaf grasping and hyperspectral measurement in a pistachio orchard. The open-source repository is available at: https://github.com/mehradmrt/UCM-AgBot-ROS2


Combining Local and Global Perception for Autonomous Navigation on Nano-UAVs

Lamberti, Lorenzo, Rutishauser, Georg, Conti, Francesco, Benini, Luca

arXiv.org Artificial Intelligence

A critical challenge in deploying unmanned aerial vehicles (UAVs) for autonomous tasks is their ability to navigate in an unknown environment. This paper introduces a novel vision-depth fusion approach for autonomous navigation on nano-UAVs. We combine the visual-based PULP-Dronet convolutional neural network for semantic information extraction, i.e., serving as the global perception, with 8x8px depth maps for close-proximity maneuvers, i.e., the local perception. When tested in-field, our integration strategy highlights the complementary strengths of both visual and depth sensory information. We achieve a 100% success rate over 15 flights in a complex navigation scenario, encompassing straight pathways, static obstacle avoidance, and 90{\deg} turns.


RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

Chen, Junting, Mu, Yao, Yu, Qiaojun, Wei, Tianming, Wu, Silang, Yuan, Zhecheng, Liang, Zhixuan, Yang, Chao, Zhang, Kaipeng, Shao, Wenqi, Qiao, Yu, Xu, Huazhe, Ding, Mingyu, Luo, Ping

arXiv.org Artificial Intelligence

Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental components of autonomous robot systems including robot perception, motion planning, and control. To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language. The RobotScript platform addresses this gap by emphasizing the unified interface with both simulation and real robots, based on abstraction from the Robot Operating System (ROS), ensuring syntax compliance and simulation validation with Gazebo. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms, and multiple grippers. Additionally, our benchmark assesses reasoning abilities for physical space and constraints, highlighting the differences between GPT-3.5, GPT-4, and Gemini in handling complex physical interactions. Finally, we present a thorough evaluation on the whole system, exploring how each module in the pipeline: code generation, perception, motion planning, and even object geometric properties, impact the overall performance of the system.


Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing

Li, Yangge, Yang, Benjamin C, Jia, Yixuan, Zhuang, Daniel, Mitra, Sayan

arXiv.org Artificial Intelligence

Perception contracts provide a method for evaluating safety of control systems that use machine learning for perception. A perception contract is a specification for testing the ML components, and it gives a method for proving end-to-end system-level safety requirements. The feasibility of contract-based testing and assurance was established earlier in the context of straight lane keeping--a 3-dimensional system with relatively simple dynamics. This paper presents the analysis of two 6 and 12-dimensional flight control systems that use multi-stage, heterogeneous, ML-enabled perception. The paper advances methodology by introducing an algorithm for constructing data and requirement guided refinement of perception contracts (DaRePC). The resulting analysis provides testable contracts which establish the state and environment conditions under which an aircraft can safety touchdown on the runway and a drone can safely pass through a sequence of gates. It can also discover conditions (e.g., low-horizon sun) that can possibly violate the safety of the vision-based control system.


Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality

de Heuvel, Jorge, Corral, Nathan, Kreis, Benedikt, Conradi, Jacobus, Driemel, Anne, Bennewitz, Maren

arXiv.org Artificial Intelligence

For the best human-robot interaction experience, the robot's navigation policy should take into account personal preferences of the user. In this paper, we present a learning framework complemented by a perception pipeline to train a depth vision-based, personalized navigation controller from user demonstrations. Our virtual reality interface enables the demonstration of robot navigation trajectories under motion of the user for dynamic interaction scenarios. The novel perception pipeline enrolls a variational autoencoder in combination with a motion predictor. It compresses the perceived depth images to a latent state representation to enable efficient reasoning of the learning agent about the robot's dynamic environment. In a detailed analysis and ablation study, we evaluate different configurations of the perception pipeline. To further quantify the navigation controller's quality of personalization, we develop and apply a novel metric to measure preference reflection based on the Fr\'echet Distance. We discuss the robot's navigation performance in various virtual scenes and demonstrate the first personalized robot navigation controller that solely relies on depth images. A supplemental video highlighting our approach is available online.


RL-DWA Omnidirectional Motion Planning for Person Following in Domestic Assistance and Monitoring

Eirale, Andrea, Martini, Mauro, Chiaberge, Marcello

arXiv.org Artificial Intelligence

In recent years, population ageing and pandemics have been demonstrated to cause isolation of older adults in their houses, generating the need for a reliable assistive figure. Service robotics recently emerged as high-tech support to the problem, providing a series of aid functionality to satisfy daily indoor assistance. Robotic solutions take care of interactive social aspects [1] or monitoring the health status of the user [2, 3]. Domestic environments are often very demanding for autonomous navigation systems due to the variety of complex and dynamic obstacles they can feature. To this end, the robot platform shall provide extreme flexibility and effective mobility to handle narrow passages thought for humans. Moreover, in order to properly assist the user, the platform should be able to follow them within this environment. Person following [4, 5] is the first step to enable any visual or vocal interaction with the user while monitoring its condition to intervene earlier in the case of anomalous events. Person following systems are often based on naive visual-control strategy, directly coupling the generation of heuristic commands for the robot with the person coordinate in the image [6]. Deep Reinforcement Learning (DRL) agents recently demonstrated significant autonomy and flexibility boost in robotic solutions.


A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling

Pan, Xudong, Xiao, Qifan, Zhang, Mi, Yang, Min

arXiv.org Machine Learning

Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with solely attacking the object detectors, this new attack strategy influences the driving decision more effectively with less attack budgets. However, little is known on whether the revealed vulnerability remains effective in end-to-end self-driving systems and, if so, how to mitigate the threat. In this paper, we present the first systematic research on the security of object tracking in SDC. Through a comprehensive case study on the full perception pipeline of a popular open-sourced self-driving system, Baidu's Apollo, we prove the mainstream multi-object tracker (MOT) based on Kalman Filter (KF) is unsafe even with an enabled multi-sensor fusion mechanism. Our root cause analysis reveals, the vulnerability is innate to the design of KF-based MOT, which shall error-handle the prediction results from the object detectors yet the adopted KF algorithm is prone to trust the observation more when its deviation from the prediction is larger. To address this design flaw, we propose a simple yet effective security patch for KF-based MOT, the core of which is an adaptive strategy to balance the focus of KF on observations and predictions according to the anomaly index of the observation-prediction deviation, and has certified effectiveness against a generalized hijacking attack model. Extensive evaluation on $4$ KF-based existing MOT implementations (including 2D and 3D, academic and Apollo ones) validate the defense effectiveness and the trivial performance overhead of our approach.