Precision Harvesting in Cluttered Environments: Integrating End Effector Design with Dual Camera Perception

Koe, Kendall, Shah, Poojan Kalpeshbhai, Walt, Benjamin, Westphal, Jordan, Marri, Samhita, Kamtikar, Shivani, Nam, James Seungbum, Uppalapati, Naveen Kumar, Krishnan, Girish, Chowdhary, Girish

arXiv.org Artificial Intelligence 

Abstract-- Due to labor shortages in specialty crop industries, a need for robotic automation to increase agricultural efficiency and productivity has arisen. Previous manipulation systems perform well in harvesting in uncluttered and structured environments. High tunnel environments are more compact and cluttered in nature, requiring a rethinking of the large form factor systems and grippers. We propose a novel codesigned framework incorporating a global detection camera and a local eye-in-hand camera that demonstrates precise localization of small fruits via closed-loop visual feedback and reliable error handling. Field experiments in high tunnels show our system can reach an average of 85.0% of cherry tomato fruit in 10.98s on average. I. INTRODUCTION Decreasing food miles and increasing sustainable agricultural practices have prompted interest in urban agriculture Figure 1: Robot picking cherry tomatoes with our Detect2Grasp in recent years.