Cherubini, Andrea
Interaction force estimation for tactile sensor arrays: Toward tactile-based interaction control for robotic fingers
Chelly, Elie, Cherubini, Andrea, Fraisse, Philippe, Amar, Faiz Ben, Khoramshahi, Mahdi
Accurate estimation of interaction forces is crucial for achieving fine, dexterous control in robotic systems. Although tactile sensor arrays offer rich sensing capabilities, their effective use has been limited by challenges such as calibration complexities, nonlinearities, and deformation. In this paper, we tackle these issues by presenting a novel method for obtaining 3D force estimation using tactile sensor arrays. Unlike existing approaches that focus on specific or decoupled force components, our method estimates full 3D interaction forces across an array of distributed sensors, providing comprehensive real-time feedback. Through systematic data collection and model training, our approach overcomes the limitations of prior methods, achieving accurate and reliable tactile-based force estimation. Besides, we integrate this estimation in a real-time control loop, enabling implicit, stable force regulation that is critical for precise robotic manipulation. Experimental validation on the Allegro robot hand with uSkin sensors demonstrates the effectiveness of our approach in real-time control, and its ability to enhance the robot's adaptability and dexterity.
Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing
Roffo, Giorgio, Biffi, Carlo, Salvagnini, Pietro, Cherubini, Andrea
To address overfitting and enhance model generalization in gastroenterological polyp size assessment, our study introduces Feature-Selection Gates (FSG) or Hard-Attention Gates (HAG) alongside Gradient Routing (GR) for dynamic feature selection. This technique aims to boost Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by promoting sparse connectivity, thereby reducing overfitting and enhancing generalization. HAG achieves this through sparsification with learnable weights, serving as a regularization strategy. GR further refines this process by optimizing HAG parameters via dual forward passes, independently from the main model, to improve feature re-weighting. Our evaluation spanned multiple datasets, including CIFAR-100 for a broad impact assessment and specialized endoscopic datasets (REAL-Colon, Misawa, and SUN) focusing on polyp size estimation, covering over 200 polyps in more than 370,000 frames. The findings indicate that our HAG-enhanced networks substantially enhance performance in both binary and triclass classification tasks related to polyp sizing. Specifically, CNNs experienced an F1 Score improvement to 87.8% in binary classification, while in triclass classification, the ViT-T model reached an F1 Score of 76.5%, outperforming traditional CNNs and ViT-T models. To facilitate further research, we are releasing our codebase, which includes implementations for CNNs, multistream CNNs, ViT, and HAG-augmented variants. This resource aims to standardize the use of endoscopic datasets, providing public training-validation-testing splits for reliable and comparable research in gastroenterological polyp size estimation. The codebase is available at github.com/cosmoimd/feature-selection-gates.
Can robots mold soft plastic materials by shaping depth images?
Gursoy, Ege, Tarbouriech, Sonny, Cherubini, Andrea
Can robots mold soft plastic materials by shaping depth images? The short answer is no: current day robots can't. In this article, we address the problem of shaping plastic material with an anthropomorphic arm/hand robot, which observes the material with a fixed depth camera. Robots capable of molding could assist humans in many tasks, such as cooking, scooping or gardening. Yet, the problem is complex, due to its high-dimensionality at both perception and control levels. To address it, we design three alternative data-based methods for predicting the effect of robot actions on the material. Then, the robot can plan the sequence of actions and their positions, to mold the material into a desired shape. To make the prediction problem tractable, we rely on two original ideas. First, we prove that under reasonable assumptions, the shaping problem can be mapped from point cloud to depth image space, with many benefits (simpler processing, no need for registration, lower computation time and memory requirements). Second, we design a novel, simple metric for quickly measuring the distance between two depth images. The metric is based on the inherent point cloud representation of depth images, which enables direct and consistent comparison of image pairs through a non-uniform scaling approach, and therefore opens promising perspectives for designing \textit{depth image -- based} robot controllers. We assess our approach in a series of unprecedented experiments, where a robotic arm/hand molds flour from initial to final shapes, either with its own dataset, or by transfer learning from a human dataset. We conclude the article by discussing the limitations of our framework and those of current day hardware, which make human-like robot molding a challenging open research problem.
Towards vision-based dual arm robotic fruit harvesting
Gursoy, Ege, Navarro, Benjamin, Cosgun, Akansel, Kulić, Dana, Cherubini, Andrea
Interest in agricultural robotics has increased considerably in recent years due to benefits such as improvement in productivity and labor reduction. However, current problems associated with unstructured environments make the development of robotic harvesters challenging. Most research in agricultural robotics focuses on single arm manipulation. Here, we propose a dual-arm approach. We present a dual-arm fruit harvesting robot equipped with a RGB-D camera, cutting and collecting tools. We exploit the cooperative task description to maximize the capabilities of the dual-arm robot. We designed a Hierarchical Quadratic Programming based control strategy to fulfill the set of hard constrains related to the robot and environment: robot joint limits, robot self-collisions, robot-fruit and robot-tree collisions. We combine deep learning and standard image processing algorithms to detect and track fruits as well as the tree trunk in the scene. We validate our perception methods on real-world RGB-D images and our control method on simulated experiments.