Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications
Kim, Chanwoo, Yoon, Jihwan, Kim, Hyeonseong, Jeong, Taemoon, Yoo, Changwoo, Lee, Seungbeen, Byeon, Soohwan, Chung, Hoon, Pan, Matthew, Oh, Jean, Lee, Kyungjae, Choi, Sungjoon
–arXiv.org Artificial Intelligence
Abstract-- Mobile robot navigation in dynamic human environments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. T o this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based looka-head controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. Mobile robot navigation in crowded, human-shared environments is inherently safety-critical and requires policies that remain reliable while adapting to diverse human behaviors.
arXiv.org Artificial Intelligence
Oct-15-2025