Calib3R: A 3D Foundation Model for Multi-Camera to Robot Calibration and 3D Metric-Scaled Scene Reconstruction

Allegro, Davide, Terreran, Matteo, Ghidoni, Stefano

arXiv.org Artificial Intelligence 

RELATED WORKS Hand-Eye Calibration: Hand-eye calibration is a well-established problem in robotics that aims to estimate the relative pose between a camera and a robot's end-effector. It is typically addressed by capturing a series of images of a known calibration pattern (e.g., a checkerboard) using a camera rigidly mounted on the robot hand, and using both the images and the corresponding robot poses to compute the camera's extrinsic parameters. Different mathematical formulations exist for solving hand-eye calibration; a widely adopted approach involves solving the equation AX = XB, where X is the unknown rigid transformation describing the pose of the camera with respect to the robot, while A and B denote the relative motions of the end-effector (from robot kinematics) and the camera (from pattern observations), respectively [31], [36]-[38]. Several other approaches were proposed: Shah [39] formulated a closed-form solution for the hand-eye problem by using an algorithm based on Singular V alue Decomposition (SVD) and the Kronecker product to solve for rotation and translation separately, while Li et al. [40] used dual quaternions to solve them simultaneously overcoming the limitations of the Kronecker product. Wang et al. [23] extended hand-eye calibration to multi-camera setups by incorporating a common reference frame but required an external motion capture system, limiting its applicability to small setups. Andreff and Heller [41], [42] proposed two similar hand-eye calibration methods that leverage the Structure-from-Motion (SfM) paradigm to estimate camera motion and introduce a formulation for hand-eye calibration that includes a factor to metrically scale camera poses.