Menegatti, Emanuele
Disentangled Iterative Surface Fitting for Contact-stable Grasp Planning
Yamanokuchi, Tomoya, Bacchin, Alberto, Olivastri, Emilio, Matsubara, Takamitsu, Menegatti, Emanuele
In this work, we address the limitation of surface fitting-based grasp planning algorithm, which primarily focuses on geometric alignment between the gripper and object surface while overlooking the stability of contact point distribution, often resulting in unstable grasps due to inadequate contact configurations. To overcome this limitation, we propose a novel surface fitting algorithm that integrates contact stability while preserving geometric compatibility. Inspired by human grasping behavior, our method disentangles the grasp pose optimization into three sequential steps: (1) rotation optimization to align contact normals, (2) translation refinement to improve Center of Mass (CoM) alignment, and (3) gripper aperture adjustment to optimize contact point distribution. We validate our approach through simulations on ten YCB dataset objects, demonstrating an 80% improvement in grasp success over conventional surface fitting methods that disregard contact stability. Further details can be found on our project page: https://tomoya-yamanokuchi.github.io/disf-project-page/.
Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation
Fusaro, Daniel, Mosco, Simone, Menegatti, Emanuele, Pretto, Alberto
Semantic segmentation of point clouds is an essential task for understanding the environment in autonomous driving and robotics. Recent range-based works achieve real-time efficiency, while point- and voxel-based methods produce better results but are affected by high computational complexity. Moreover, highly complex deep learning models are often not suited to efficiently learn from small datasets. Their generalization capabilities can easily be driven by the abundance of data rather than the architecture design. In this paper, we harness the information from the three-dimensional representation to proficiently capture local features, while introducing the range image representation to incorporate additional information and facilitate fast computation. A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations. Extensive experiments on SemanticKITTI and nuScenes datasets demonstrate the benefits of our modification in a ``small data'' setup, in which only one sequence of the dataset is used to train the models, but also in the conventional setup, where all sequences except one are used for training. We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time, making it a viable choice for real-world case applications. The code of our method is available at https://github.com/Bender97/WaffleAndRange.
WasteGAN: Data Augmentation for Robotic Waste Sorting through Generative Adversarial Networks
Bacchin, Alberto, Barcellona, Leonardo, Terreran, Matteo, Ghidoni, Stefano, Menegatti, Emanuele, Kiyokawa, Takuya
Robotic waste sorting poses significant challenges in both perception and manipulation, given the extreme variability of objects that should be recognized on a cluttered conveyor belt. While deep learning has proven effective in solving complex tasks, the necessity for extensive data collection and labeling limits its applicability in real-world scenarios like waste sorting. To tackle this issue, we introduce a data augmentation method based on a novel GAN architecture called wasteGAN. The proposed method allows to increase the performance of semantic segmentation models, starting from a very limited bunch of labeled examples, such as few as 100. The key innovations of wasteGAN include a novel loss function, a novel activation function, and a larger generator block. Overall, such innovations helps the network to learn from limited number of examples and synthesize data that better mirrors real-world distributions. We then leverage the higher-quality segmentation masks predicted from models trained on the wasteGAN synthetic data to compute semantic-aware grasp poses, enabling a robotic arm to effectively recognizing contaminants and separating waste in a real-world scenario. Through comprehensive evaluation encompassing dataset-based assessments and real-world experiments, our methodology demonstrated promising potential for robotic waste sorting, yielding performance gains of up to 5.8\% in picking contaminants. The project page is available at https://github.com/bach05/wasteGAN.git
Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models
Barcellona, Leonardo, Bacchin, Alberto, Terreran, Matteo, Menegatti, Emanuele, Ghidoni, Stefano
The ability of a robot to pick an object, known as robot grasping, is crucial for several applications, such as assembly or sorting. In such tasks, selecting the right target to pick is as essential as inferring a correct configuration of the gripper. A common solution to this problem relies on semantic segmentation models, which often show poor generalization to unseen objects and require considerable time and massive data to be trained. To reduce the need for large datasets, some grasping pipelines exploit few-shot semantic segmentation models, which are capable of recognizing new classes given a few examples. However, this often comes at the cost of limited performance and fine-tuning is required to be effective in robot grasping scenarios. In this work, we propose to overcome all these limitations by combining the impressive generalization capability reached by foundation models with a high-performing few-shot classifier, working as a score function to select the segmentation that is closer to the support set. The proposed model is designed to be embedded in a grasp synthesis pipeline. The extensive experiments using one or five examples show that our novel approach overcomes existing performance limitations, improving the state of the art both in few-shot semantic segmentation on the Graspnet-1B (+10.5% mIoU) and Ocid-grasp (+1.6% AP) datasets, and real-world few-shot grasp synthesis (+21.7% grasp accuracy). The project page is available at: https://leobarcellona.github.io/showandgrasp.github.io/
A Graph-based Optimization Framework for Hand-Eye Calibration for Multi-Camera Setups
Evangelista, Daniele, Olivastri, Emilio, Allegro, Davide, Menegatti, Emanuele, Pretto, Alberto
Hand-eye calibration is the problem of estimating the spatial transformation between a reference frame, usually the base of a robot arm or its gripper, and the reference frame of one or multiple cameras. Generally, this calibration is solved as a non-linear optimization problem, what instead is rarely done is to exploit the underlying graph structure of the problem itself. Actually, the problem of hand-eye calibration can be seen as an instance of the Simultaneous Localization and Mapping (SLAM) problem. Inspired by this fact, in this work we present a pose-graph approach to the hand-eye calibration problem that extends a recent state-of-the-art solution in two different ways: i) by formulating the solution to eye-on-base setups with one camera; ii) by covering multi-camera robotic setups. The proposed approach has been validated in simulation against standard hand-eye calibration methods. Moreover, a real application is shown. In both scenarios, the proposed approach overcomes all alternative methods. We release with this paper an open-source implementation of our graph-based optimization framework for multi-camera setups.
Pushing the Limits of Learning-based Traversability Analysis for Autonomous Driving on CPU
Fusaro, Daniel, Olivastri, Emilio, Evangelista, Daniele, Imperoli, Marco, Menegatti, Emanuele, Pretto, Alberto
Self-driving vehicles and autonomous ground robots require a reliable and accurate method to analyze the traversability of the surrounding environment for safe navigation. This paper proposes and evaluates a real-time machine learning-based Traversability Analysis method that combines geometric features with appearance-based features in a hybrid approach based on a SVM classifier. In particular, we show that integrating a new set of geometric and visual features and focusing on important implementation details enables a noticeable boost in performance and reliability. The proposed approach has been compared with state-of-the-art Deep Learning approaches on a public dataset of outdoor driving scenarios. It reaches an accuracy of 89.2% in scenarios of varying complexity, demonstrating its effectiveness and robustness. The method runs fully on CPU and reaches comparable results with respect to the other methods, operates faster, and requires fewer hardware resources.
People Tracking in Panoramic Video for Guiding Robots
Bacchin, Alberto, Berno, Filippo, Menegatti, Emanuele, Pretto, Alberto
A guiding robot aims to effectively bring people to and from specific places within environments that are possibly unknown to them. During this operation the robot should be able to detect and track the accompanied person, trying never to lose sight of her/him. A solution to minimize this event is to use an omnidirectional camera: its 360{\deg} Field of View (FoV) guarantees that any framed object cannot leave the FoV if not occluded or very far from the sensor. However, the acquired panoramic videos introduce new challenges in perception tasks such as people detection and tracking, including the large size of the images to be processed, the distortion effects introduced by the cylindrical projection and the periodic nature of panoramic images. In this paper, we propose a set of targeted methods that allow to effectively adapt to panoramic videos a standard people detection and tracking pipeline originally designed for perspective cameras. Our methods have been implemented and tested inside a deep learning-based people detection and tracking framework with a commercial 360{\deg} camera. Experiments performed on datasets specifically acquired for guiding robot applications and on a real service robot show the effectiveness of the proposed approach over other state-of-the-art systems. We release with this paper the acquired and annotated datasets and the open-source implementation of our method.
RoboCup-2003: New Scientific and Technical Advances
Pagello, Enrico, Menegatti, Emanuele, Bredenfel, Ansgar, Costa, Paulo, Christaller, Thomas, Jacoff, Adam, Polani, Daniel, Riedmiller, Martin, Saffiotti, Alessandro, Sklar, Elizabeth, Tomoichi, Takashi
RoboCup is no longer just the Soccer World Cup for autonomous robots but has evolved to become a coordinated initiative encompassing four different robotics events: (1) Soccer, (2) Rescue, (3) Junior (focused on education), and (4) a Scientific Symposium. RoboCup-2003 took place from 2 to 11 July 2003 in Padua (Italy); it was colocated with other scientific events in the field of AI and robotics. In this article, in addition to reporting on the results of the games, we highlight the robotics and AI technologies exploited by the teams in the different leagues and describe the most meaningful scientific contributions.
RoboCup-2003: New Scientific and Technical Advances
Pagello, Enrico, Menegatti, Emanuele, Bredenfel, Ansgar, Costa, Paulo, Christaller, Thomas, Jacoff, Adam, Polani, Daniel, Riedmiller, Martin, Saffiotti, Alessandro, Sklar, Elizabeth, Tomoichi, Takashi
This article reports on the RoboCup-2003 event. RoboCup is no longer just the Soccer World Cup for autonomous robots but has evolved to become a coordinated initiative encompassing four different robotics events: (1) Soccer, (2) Rescue, (3) Junior (focused on education), and (4) a Scientific Symposium. RoboCup-2003 took place from 2 to 11 July 2003 in Padua (Italy); it was colocated with other scientific events in the field of AI and robotics. In this article, in addition to reporting on the results of the games, we highlight the robotics and AI technologies exploited by the teams in the different leagues and describe the most meaningful scientific contributions.