OVAMOS: A Framework for Open-Vocabulary Multi-Object Search in Unknown Environments

Wang, Qianwei, Xu, Yifan, Kamat, Vineet, Menassa, Carol

arXiv.org Artificial Intelligence 

OV AMOS: A Framework for Open-V ocabulary Multi-Object Search in Unknown Environments Qianwei Wang*, Yifan Xu*, Vineet Kamat, and Carol Menassa Abstract -- Object search is a fundamental task for robots deployed in indoor building environments, yet challenges arise due to observation instability, especially for open-vocabulary models. While foundation models (LLMs/VLMs) enable reasoning about object locations even without direct visibility, the ability to recover from failures and replan remains crucial. T o address these challenges, we propose a framework integrating VLM-based reasoning, frontier-based exploration, and a Partially Observable Markov Decision Process (POMDP) framework to solve the MOS problem in novel environments. VLM enhances search efficiency by inferring object-environment relationships, frontier-based exploration guides navigation in unknown spaces, and POMDP models observation uncertainty, allowing recovery from failures in occlusion and cluttered environments. We evaluate our framework on 120 simulated scenarios across several Habitat-Matterport3D (HM3D) scenes and a real-world robot experiment in a 50-square-meter office, demonstrating significant improvements in both efficiency and success rate over baseline methods. I NTRODUCTION Multi-Object Search (MOS) is a crucial task in robotics [1]. Consider a scenario where in a workplace setting, a robot may need to retrieve multiple objects to complete a task, such as gathering necessary documents, tools, or equipment for an assembly process.