Goto

Collaborating Authors

 visual servoing


Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control

arXiv.org Artificial Intelligence

This work introduces a self-supervised neuro-analytical, cost efficient, model for visual-based quadrotor control in which a small 1.7M parameters student ConvNet learns automatically from an analytical teacher, an improved image-based visual servoing (IBVS) controller . Our IBVS system solves numerical instabilities by reducing the classical visual servoing equations and enabling efficient stable image feature detection. Through knowledge distillation, the student model achieves 11 faster inference compared to the teacher IBVS pipeline, while demonstrating similar control accuracy at a significantly lower computational and memory cost. Our vision-only self-supervised neuro-analytic control, enables quadrotor orientation and movement without requiring explicit geometric models or fiducial markers. The proposed methodology leverages simulation-to-reality transfer learning and is validated on a small drone platform in GPS-denied indoor environments. Our key contributions include: (1) an analytical IBVS teacher that solves numerical instabilities inherent in classical approaches, (2) a two-stage segmentation pipeline combining YOLOv11 with a U-Net-based mask splitter for robust anterior-posterior vehicle segmentation to correctly estimate the orientation of the target, and (3) an efficient knowledge distillation dual-path system, which transfers geometric visual servoing capabilities from the analytical IBVS teacher to a compact and small student neural network that outperforms the teacher, while being suitable for real-time onboard deployment.


A Novel Feedforward Youla Parameterization Method for Avoiding Local Minima in Stereo Image Based Visual Servoing Control

arXiv.org Artificial Intelligence

In robot navigation and manipulation, accurately determining the camera's pose relative to the environment is crucial for effective task execution. In this paper, we systematically prove that this problem corresponds to the Perspective-3-Point (P3P) formulation, where exactly three known 3D points and their corresponding 2D image projections are used to estimate the pose of a stereo camera. In image-based visual servoing (IBVS) control, the system becomes overdetermined, as the 6 degrees of freedom (DoF) of the stereo camera must align with 9 observed 2D features in the scene. When more constraints are imposed than available DoFs, global stability cannot be guaranteed, as the camera may become trapped in a local minimum far from the desired configuration during servoing. To address this issue, we propose a novel control strategy for accurately positioning a calibrated stereo camera. Our approach integrates a feedforward controller with a Youla parameterization-based feedback controller, ensuring robust servoing performance. Through simulations, we demonstrate that our method effectively avoids local minima and enables the camera to reach the desired pose accurately and efficiently.


Deep Visual Servoing of an Aerial Robot Using Keypoint Feature Extraction

arXiv.org Artificial Intelligence

The problem of image-based visual servoing (IBVS) of an aerial robot using deep-learning-based keypoint detection is addressed in this article. A monocular RGB camera mounted on the platform is utilized to collect the visual data. A convolutional neural network (CNN) is then employed to extract the features serving as the visual data for the servoing task. This paper contributes to the field by circumventing not only the challenge stemming from the need for man-made marker detection in conventional visual servoing techniques, but also enhancing the robustness against undesirable factors including occlusion, varying illumination, clutter, and background changes, thereby broadening the applicability of perception-guided motion control tasks in aerial robots. Additionally, extensive physics-based ROS Gazebo simulations are conducted to assess the effectiveness of this method, in contrast to many existing studies that rely solely on physics-less simulations. A demonstration video is available at https://youtu.be/Dd2Her8Ly-E.


One-Shot Dual-Arm Imitation Learning

arXiv.org Artificial Intelligence

We introduce One-Shot Dual-Arm Imitation Learning (ODIL), which enables dual-arm robots to learn precise and coordinated everyday tasks from just a single demonstration of the task. ODIL uses a new three-stage visual servoing (3-VS) method for precise alignment between the end-effector and target object, after which replay of the demonstration trajectory is sufficient to perform the task. This is achieved without requiring prior task or object knowledge, or additional data collection and training following the single demonstration. Furthermore, we propose a new dual-arm coordination paradigm for learning dual-arm tasks from a single demonstration. ODIL was tested on a real-world dual-arm robot, demonstrating state-of-the-art performance across six precise and coordinated tasks in both 4-DoF and 6-DoF settings, and showing robustness in the presence of distractor objects and partial occlusions. Videos are available at: https://www.robot-learning.uk/one-shot-dual-arm.


ViT-VS: On the Applicability of Pretrained Vision Transformer Features for Generalizable Visual Servoing

arXiv.org Artificial Intelligence

Visual servoing enables robots to precisely position their end-effector relative to a target object. While classical methods rely on hand-crafted features and thus are universally applicable without task-specific training, they often struggle with occlusions and environmental variations, whereas learning-based approaches improve robustness but typically require extensive training. We present a visual servoing approach that leverages pretrained vision transformers for semantic feature extraction, combining the advantages of both paradigms while also being able to generalize beyond the provided sample. Our approach achieves full convergence in unperturbed scenarios and surpasses classical image-based visual servoing by up to 31.2\% relative improvement in perturbed scenarios. Even the convergence rates of learning-based methods are matched despite requiring no task- or object-specific training. Real-world evaluations confirm robust performance in end-effector positioning, industrial box manipulation, and grasping of unseen objects using only a reference from the same category. Our code and simulation environment are available at: https://alessandroscherl.github.io/ViT-VS/


Hybrid Visual Servoing of Tendon-driven Continuum Robots

arXiv.org Artificial Intelligence

HVS outperforms DLBVS in iteration time, error reduction, and con - trol smoothness. Experimental validation confirms HVS effectiveness under occlusion s and noise. Abstract This paper introduces a novel Hybrid Visual Servoing (HVS) approa ch for controlling tendon-driven continuum robots (TDCRs). The HVS sys tem combines Image-Based Visual Servoing (IBVS) with Deep Learning-Based Visual Servoing (DLBVS) to overcome the limitations of each method and improve overall performance. IBVS offers higher accuracy and fa ster convergence in feature-rich environments, while DLBVS enhances rob ustness against disturbances and offers a larger workspace. By enabling sm ooth transitions between IBVS and DLBVS, the proposed HVS ensures e ffective control in dynamic, unstructured environments. The effectivene ss of this approach is validated through simulations and real-world experiments, demonstrating that HVS achieves reduced iteration time, faster conver gence, lower final error, and smoother performance compared to DLBVS alone, while maintaining DLBVS's robustness in challenging conditions such as occlu - sions, lighting changes, actuator noise, and physical impacts.


Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks

arXiv.org Artificial Intelligence

Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent challenges, such as the necessity for a target image at test time, the requirement of substantial overlap between initial and target images, and the reliance on feedback from a single camera. This paper introduces Imagine2Servo, an innovative approach leveraging diffusion-based image editing techniques to enhance visual servoing algorithms by generating intermediate goal images. This methodology allows for the extension of visual servoing applications beyond traditional constraints, enabling tasks like long-range navigation and manipulation without predefined goal images. We propose a pipeline that synthesizes subgoal images grounded in the task at hand, facilitating servoing in scenarios with minimal initial and target image overlap and integrating multi-camera feedback for comprehensive task execution. Our contributions demonstrate a novel application of image generation to robotic control, significantly broadening the capabilities of visual servoing systems. Real-world experiments validate the effectiveness and versatility of the Imagine2Servo framework in accomplishing a variety of tasks, marking a notable advancement in the field of visual servoing.


Visual Servoing for Robotic On-Orbit Servicing: A Survey

arXiv.org Artificial Intelligence

On-orbit servicing (OOS) activities will power the next big step for sustainable exploration and commercialization of space. Developing robotic capabilities for autonomous OOS operations is a priority for the space industry. Visual Servoing (VS) enables robots to achieve the precise manoeuvres needed for critical OOS missions by utilizing visual information for motion control. This article presents an overview of existing VS approaches for autonomous OOS operations with space manipulator systems (SMS). We divide the approaches according to their contribution to the typical phases of a robotic OOS mission: a) Recognition, b) Approach, and c) Contact. We also present a discussion on the reviewed VS approaches, identifying current trends. Finally, we highlight the challenges and areas for future research on VS techniques for robotic OOS.


Robot Agnostic Visual Servoing considering kinematic constraints enabled by a decoupled network trajectory planner structure

arXiv.org Artificial Intelligence

We propose a visual servoing method consisting of a detection network and a velocity trajectory planner. First, the detection network estimates the objects position and orientation in the image space. Furthermore, these are normalized and filtered. The direction and orientation is then the input to the trajectory planner, which considers the kinematic constrains of the used robotic system. This allows safe and stable control, since the kinematic boundary values are taken into account in planning. Also, by having direction estimation and velocity planner separated, the learning part of the method does not directly influence the control value. This also enables the transfer of the method to different robotic systems without retraining, therefore being robot agnostic. We evaluate our method on different visual servoing tasks with and without clutter on two different robotic systems. Our method achieved mean absolute position errors of <0.5 mm and orientation errors of <1{\deg}. Additionally, we transferred the method to a new system which differs in robot and camera, emphasizing robot agnostic capability of our method.


Direct Imitation Learning-based Visual Servoing using the Large Projection Formulation

arXiv.org Artificial Intelligence

Today robots must be safe, versatile, and user-friendly to operate in unstructured and human-populated environments. Dynamical system-based imitation learning enables robots to perform complex tasks stably and without explicit programming, greatly simplifying their real-world deployment. To exploit the full potential of these systems it is crucial to implement closed loops that use visual feedback. Vision permits to cope with environmental changes, but is complex to handle due to the high dimension of the image space. This study introduces a dynamical system-based imitation learning for direct visual servoing. It leverages off-the-shelf deep learning-based perception backbones to extract robust features from the raw input image, and an imitation learning strategy to execute sophisticated robot motions. The learning blocks are integrated using the large projection task priority formulation. As demonstrated through extensive experimental analysis, the proposed method realizes complex tasks with a robotic manipulator.