AITopics | perception pipeline

Collaborating Authors

perception pipeline

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CODEI: Resource-Efficient Task-Driven Co-Design of Perception and Decision Making for Mobile Robots Applied to Autonomous Vehicles

Milojevic, Dejan, Zardini, Gioele, Elser, Miriam, Censi, Andrea, Frazzoli, Emilio

arXiv.org Artificial IntelligenceMar-13-2025

This paper discusses the integration challenges and strategies for designing mobile robots, by focusing on the task-driven, optimal selection of hardware and software to balance safety, efficiency, and minimal usage of resources such as costs, energy, computational requirements, and weight. We emphasize the interplay between perception and motion planning in decision-making by introducing the concept of occupancy queries to quantify the perception requirements for sampling-based motion planners. Sensor and algorithm performance are evaluated using False Negative Rates (FPR) and False Positive Rates (FPR) across various factors such as geometric relationships, object properties, sensor resolution, and environmental conditions. By integrating perception requirements with perception performance, an Integer Linear Programming (ILP) approach is proposed for efficient sensor and algorithm selection and placement. This forms the basis for a co-design optimization that includes the robot body, motion planner, perception pipeline, and computing unit. We refer to this framework for solving the co-design problem of mobile robots as CODEI, short for Co-design of Embodied Intelligence. A case study on developing an Autonomous Vehicle (AV) for urban scenarios provides actionable information for designers, and shows that complex tasks escalate resource demands, with task performance affecting choices of the autonomy stack. The study demonstrates that resource prioritization influences sensor choice: cameras are preferred for cost-effective and lightweight designs, while lidar sensors are chosen for better energy and computational efficiency.

algorithm, configuration, robot, (13 more...)

arXiv.org Artificial Intelligence

2503.10296

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Towards Fluorescence-Guided Autonomous Robotic Partial Nephrectomy on Novel Tissue-Mimicking Hydrogel Phantoms

Kilmer, Ethan, Chen, Joseph, Ge, Jiawei, Sarda, Preksha, Cha, Richard, Cleary, Kevin, Shepard, Lauren, Ghazi, Ahmed Ezzat, Scheikl, Paul Maria, Krieger, Axel

arXiv.org Artificial IntelligenceMar-3-2025

Autonomous robotic systems hold potential for improving renal tumor resection accuracy and patient outcomes. We present a fluorescence-guided robotic system capable of planning and executing incision paths around exophytic renal tumors with a clinically relevant resection margin. Leveraging point cloud observations, the system handles irregular tumor shapes and distinguishes healthy from tumorous tissue based on near-infrared imaging, akin to indocyanine green staining in partial nephrectomy. Tissue-mimicking phantoms are crucial for the development of autonomous robotic surgical systems for interventions where acquiring ex-vivo animal tissue is infeasible, such as cancer of the kidney and renal pelvis. To this end, we propose novel hydrogel-based kidney phantoms with exophytic tumors that mimic the physical and visual behavior of tissue, and are compatible with electrosurgical instruments, a common limitation of silicone-based phantoms. In contrast to previous hydrogel phantoms, we mix the material with near-infrared dye to enable fluorescence-guided tumor segmentation. Autonomous real-world robotic experiments validate our system and phantoms, achieving an average margin accuracy of 1.44 mm in a completion time of 69 sec.

incision, kidney phantom, tumor, (13 more...)

arXiv.org Artificial Intelligence

2503.02265

Country:

North America > United States (1.00)
Europe > Norway (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Kidney Cancer (0.49)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

An Event-Based Perception Pipeline for a Table Tennis Robot

Ziegler, Andreas, Gossard, Thomas, Glover, Arren, Zell, Andreas

arXiv.org Artificial IntelligenceFeb-2-2025

Table tennis robots gained traction over the last years and have become a popular research challenge for control and perception algorithms. Fast and accurate ball detection is crucial for enabling a robotic arm to rally the ball back successfully. So far, most table tennis robots use conventional, frame-based cameras for the perception pipeline. However, frame-based cameras suffer from motion blur if the frame rate is not high enough for fast-moving objects. Event-based cameras, on the other hand, do not have this drawback since pixels report changes in intensity asynchronously and independently, leading to an event stream with a temporal resolution on the order of us. To the best of our knowledge, we present the first real-time perception pipeline for a table tennis robot that uses only event-based cameras. We show that compared to a frame-based pipeline, event-based perception pipelines have an update rate which is an order of magnitude higher. This is beneficial for the estimation and prediction of the ball's position, velocity, and spin, resulting in lower mean errors and uncertainties. These improvements are an advantage for the robot control, which has to be fast, given the short time a table tennis ball is flying until the robot has to hit back.

artificial intelligence, machine learning, pipeline, (18 more...)

arXiv.org Artificial Intelligence

2502.00749

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Italy (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

RoMu4o: A Robotic Manipulation Unit For Orchard Operations Automating Proximal Hyperspectral Leaf Sensing

Mortazavi, Mehrad, Cappelleri, David J., Ehsani, Reza

arXiv.org Artificial IntelligenceJan-17-2025

Driven by the need to address labor shortages and meet the demands of a rapidly growing population, robotic automation has become a critical component in precision agriculture. Leaf-level hyperspectral spectroscopy is shown to be a powerful tool for phenotyping, monitoring crop health, identifying essential nutrients within plants as well as detecting diseases and water stress. This work introduces RoMu4o, a robotic manipulation unit for orchard operations offering an automated solution for proximal hyperspectral leaf sensing. This ground robot is equipped with a 6DOF robotic arm and vision system for real-time deep learning-based image processing and motion planning. We developed robust perception and manipulation pipelines that enable the robot to successfully grasp target leaves and perform spectroscopy. These frameworks operate synergistically to identify and extract the 3D structure of leaves from an observed batch of foliage, propose 6D poses, and generate collision-free constraint-aware paths for precise leaf manipulation. The end-effector of the arm features a compact design that integrates an independent lighting source with a hyperspectral sensor, enabling high-fidelity data acquisition while streamlining the calibration process for accurate measurements. Our ground robot is engineered to operate in unstructured orchard environments. However, the performance of the system is evaluated in both indoor and outdoor plant models. The system demonstrated reliable performance for 1-LPB hyperspectral sampling, achieving 95% success rate in lab trials and 79% in field trials. Field experiments revealed an overall success rate of 70% for autonomous leaf grasping and hyperspectral measurement in a pistachio orchard. The open-source repository is available at: https://github.com/mehradmrt/UCM-AgBot-ROS2

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.10621

Country:

North America > United States > California > Merced County > Merced (0.28)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Combining Local and Global Perception for Autonomous Navigation on Nano-UAVs

Lamberti, Lorenzo, Rutishauser, Georg, Conti, Francesco, Benini, Luca

arXiv.org Artificial IntelligenceMar-18-2024

A critical challenge in deploying unmanned aerial vehicles (UAVs) for autonomous tasks is their ability to navigate in an unknown environment. This paper introduces a novel vision-depth fusion approach for autonomous navigation on nano-UAVs. We combine the visual-based PULP-Dronet convolutional neural network for semantic information extraction, i.e., serving as the global perception, with 8x8px depth maps for close-proximity maneuvers, i.e., the local perception. When tested in-field, our integration strategy highlights the complementary strengths of both visual and depth sensory information. We achieve a 100% success rate over 15 flights in a complex navigation scenario, encompassing straight pathways, static obstacle avoidance, and 90{\deg} turns.

machine learning, natural language, perception pipeline, (16 more...)

arXiv.org Artificial Intelligence

2403.11661

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.64)

Industry: Aerospace & Defense > Aircraft (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.51)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.35)

Add feedback

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

Chen, Junting, Mu, Yao, Yu, Qiaojun, Wei, Tianming, Wu, Silang, Yuan, Zhecheng, Liang, Zhixuan, Yang, Chao, Zhang, Kaipeng, Shao, Wenqi, Qiao, Yu, Xu, Huazhe, Ding, Mingyu, Luo, Ping

arXiv.org Artificial IntelligenceFeb-22-2024

Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental components of autonomous robot systems including robot perception, motion planning, and control. To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language. The RobotScript platform addresses this gap by emphasizing the unified interface with both simulation and real robots, based on abstraction from the Robot Operating System (ROS), ensuring syntax compliance and simulation validation with Gazebo. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms, and multiple grippers. Additionally, our benchmark assesses reasoning abilities for physical space and constraints, highlighting the differences between GPT-3.5, GPT-4, and Gemini in handling complex physical interactions. Finally, we present a thorough evaluation on the whole system, exploring how each module in the pipeline: code generation, perception, motion planning, and even object geometric properties, impact the overall performance of the system.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.14623

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas > Schleicher County (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Refining Perception Contracts: Case Studies in Vision-based Safe Auto-landing

Li, Yangge, Yang, Benjamin C, Jia, Yixuan, Zhuang, Daniel, Mitra, Sayan

arXiv.org Artificial IntelligenceNov-14-2023

Perception contracts provide a method for evaluating safety of control systems that use machine learning for perception. A perception contract is a specification for testing the ML components, and it gives a method for proving end-to-end system-level safety requirements. The feasibility of contract-based testing and assurance was established earlier in the context of straight lane keeping--a 3-dimensional system with relatively simple dynamics. This paper presents the analysis of two 6 and 12-dimensional flight control systems that use multi-stage, heterogeneous, ML-enabled perception. The paper advances methodology by introducing an algorithm for constructing data and requirement guided refinement of perception contracts (DaRePC). The resulting analysis provides testable contracts which establish the state and environment conditions under which an aircraft can safety touchdown on the runway and a drone can safely pass through a sequence of gates. It can also discover conditions (e.g., low-horizon sun) that can possibly violate the safety of the vision-based control system.

artificial intelligence, contract, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.08652

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > Switzerland (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Air (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Learning Depth Vision-Based Personalized Robot Navigation From Dynamic Demonstrations in Virtual Reality

de Heuvel, Jorge, Corral, Nathan, Kreis, Benedikt, Conradi, Jacobus, Driemel, Anne, Bennewitz, Maren

arXiv.org Artificial IntelligenceJul-31-2023

For the best human-robot interaction experience, the robot's navigation policy should take into account personal preferences of the user. In this paper, we present a learning framework complemented by a perception pipeline to train a depth vision-based, personalized navigation controller from user demonstrations. Our virtual reality interface enables the demonstration of robot navigation trajectories under motion of the user for dynamic interaction scenarios. The novel perception pipeline enrolls a variational autoencoder in combination with a motion predictor. It compresses the perceived depth images to a latent state representation to enable efficient reasoning of the learning agent about the robot's dynamic environment. In a detailed analysis and ablation study, we evaluate different configurations of the perception pipeline. To further quantify the navigation controller's quality of personalization, we develop and apply a novel metric to measure preference reflection based on the Fr\'echet Distance. We discuss the robot's navigation performance in various virtual scenes and demonstrate the first personalized robot navigation controller that solely relies on depth images. A supplemental video highlighting our approach is available online.

artificial intelligence, machine learning, robot, (17 more...)

arXiv.org Artificial Intelligence

2210.01683

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

RL-DWA Omnidirectional Motion Planning for Person Following in Domestic Assistance and Monitoring

Eirale, Andrea, Martini, Mauro, Chiaberge, Marcello

arXiv.org Artificial IntelligenceJan-13-2023

In recent years, population ageing and pandemics have been demonstrated to cause isolation of older adults in their houses, generating the need for a reliable assistive figure. Service robotics recently emerged as high-tech support to the problem, providing a series of aid functionality to satisfy daily indoor assistance. Robotic solutions take care of interactive social aspects [1] or monitoring the health status of the user [2, 3]. Domestic environments are often very demanding for autonomous navigation systems due to the variety of complex and dynamic obstacles they can feature. To this end, the robot platform shall provide extreme flexibility and effective mobility to handle narrow passages thought for humans. Moreover, in order to properly assist the user, the platform should be able to follow them within this environment. Person following [4, 5] is the first step to enable any visual or vocal interaction with the user while monitoring its condition to intervene earlier in the case of anomalous events. Person following systems are often based on naive visual-control strategy, directly coupling the generation of heuristic commands for the robot with the person coordinate in the image [6]. Deep Reinforcement Learning (DRL) agents recently demonstrated significant autonomy and flexibility boost in robotic solutions.

machine learning, platform, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2211.04993

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.05)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

A Certifiable Security Patch for Object Tracking in Self-Driving Systems via Historical Deviation Modeling

Pan, Xudong, Xiao, Qifan, Zhang, Mi, Yang, Min

arXiv.org Machine LearningJul-18-2022

Self-driving cars (SDC) commonly implement the perception pipeline to detect the surrounding obstacles and track their moving trajectories, which lays the ground for the subsequent driving decision making process. Although the security of obstacle detection in SDC is intensively studied, not until very recently the attackers start to exploit the vulnerability of the tracking module. Compared with solely attacking the object detectors, this new attack strategy influences the driving decision more effectively with less attack budgets. However, little is known on whether the revealed vulnerability remains effective in end-to-end self-driving systems and, if so, how to mitigate the threat. In this paper, we present the first systematic research on the security of object tracking in SDC. Through a comprehensive case study on the full perception pipeline of a popular open-sourced self-driving system, Baidu's Apollo, we prove the mainstream multi-object tracker (MOT) based on Kalman Filter (KF) is unsafe even with an enabled multi-sensor fusion mechanism. Our root cause analysis reveals, the vulnerability is innate to the design of KF-based MOT, which shall error-handle the prediction results from the object detectors yet the adopted KF algorithm is prone to trust the observation more when its deviation from the prediction is larger. To address this design flaw, we propose a simple yet effective security patch for KF-based MOT, the core of which is an adaptive strategy to balance the focus of KF on observations and predictions according to the anomaly index of the observation-prediction deviation, and has certified effectiveness against a generalized hijacking attack model. Extensive evaluation on $4$ KF-based existing MOT implementations (including 2D and 3D, academic and Apollo ones) validate the defense effectiveness and the trivial performance overhead of our approach.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Machine Learning

2207.08556

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback