Goto

Collaborating Authors

 tz 0


9a6b278218966499194491f55ccf8b75-Supplemental-Conference.pdf

Neural Information Processing Systems

The unit ℓ2-spherein d-dimensions that is centered at the origin is denoted bySd 1. Additionally, given a pair of symmetric matricesA,B Rd, we write A B if and only if x (A B)x 0, x Rd. More linear algebra facts appear in AppendixE. Let V P be a subset of distributions indexed by the points in the hypercubeEd = { 1,1}d. For a number of facts from probability and statistics (both related and unrelated to exponential families),wereferthereadertoAppendixF.


ClipRover: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots

Zhang, Yuxuan, Abdullah, Adnan, Koppal, Sanjeev J., Islam, Md Jahidul

arXiv.org Artificial Intelligence

Vision-language navigation (VLN) has emerged as a promising paradigm, enabling mobile robots to perform zero-shot inference and execute tasks without specific pre-programming. However, current systems often separate map exploration and path planning, with exploration relying on inefficient algorithms due to limited (partially observed) environmental information. In this paper, we present a novel navigation pipeline named ''ClipRover'' for simultaneous exploration and target discovery in unknown environments, leveraging the capabilities of a vision-language model named CLIP. Our approach requires only monocular vision and operates without any prior map or knowledge about the target. For comprehensive evaluations, we design the functional prototype of a UGV (unmanned ground vehicle) system named ''Rover Master'', a customized platform for general-purpose VLN tasks. We integrate and deploy the ClipRover pipeline on Rover Master to evaluate its throughput, obstacle avoidance capability, and trajectory performance across various real-world scenarios. Experimental results demonstrate that ClipRover consistently outperforms traditional map traversal algorithms and achieves performance comparable to path-planning methods that depend on prior map and target knowledge. Notably, ClipRover offers real-time active navigation without requiring pre-captured candidate images or pre-built node graphs, addressing key limitations of existing VLN pipelines.