Just Add Force for Contact-Rich Robot Policies

Xie, William, Caldararu, Stefan, Correll, Nikolaus

arXiv.org Artificial Intelligence 

Robot foundation models [1, 2, 3, 4, 5, 6, 7, 8] leverage large-scale datasets spanning diverse objects, scenes, and embodiments to produce generalizable, cross-platform robot policies. The utilized data adheres to limited modalities: vision, language, and robot action-most typically, workspace camera view, text annotation of a given task, end-effector pose, and binary (open or closed) gripper position [3]. The latter, binary gripper position, especially without force feedback, precludes robot foundation models from successfully grasping many delicate objects such as soft produce, brittle dried goods, paper containers, and other such fragile and deformable items. In this paper, we propose a modification to this archetypal structure: continuous, rather than binary, gripper positions and corresponding grasp force feedback. We contribute 1) a novel dataset of 130 trajectories with continuous gripper position and force feedback, spanning 30 unique objects ranging in deformability, volume, and mass (from 1g to 500g) and 2) train diffusion policies [9] with and without force feedback, showing that force enables delicate grasping performant with state-of-the-art LLM-based methods at a near 4x reduced latency with promise for generalizability at greater data scale. Our position is that force, a strong supervisory signal of contact and grasp-success, along with continuous gripper position, rather than binary open or closed states, should be included in future datasets used in the training of robot foundation models. Our current-draw-based force sensing method is gripper-agnostic and requires no special hardware ("skin" or otherwise). While noisier and less accurate than bespoke solutions, policies trained on our data are capable of delicate grasps. Improved resolution and frequency of force and other tactile signals likely would further improve grasp fidelity and robustness.