interactive navigation
Interactive Navigation with Adaptive Non-prehensile Mobile Manipulation
Dai, Cunxi, Liu, Xiaohan, Sreenath, Koushil, Li, Zhongyu, Hollis, Ralph
This paper introduces a framework for interactive navigation through adaptive non-prehensile mobile manipulation. A key challenge in this process is handling objects with unknown dynamics, which are difficult to infer from visual observation. To address this, we propose an adaptive dynamics model for common movable indoor objects via learned SE(2) dynamics representations. This model is integrated into Model Predictive Path Integral (MPPI) control to guide the robot's interactions. Additionally, the learned dynamics help inform decision-making when navigating around objects that cannot be manipulated.Our approach is validated in both simulation and real-world scenarios, demonstrating its ability to accurately represent object dynamics and effectively manipulate various objects. We further highlight its success in the Navigation Among Movable Objects (NAMO) task by deploying the proposed framework on a dynamically balancing mobile robot, Shmoobot. Project website: https://cmushmoobot.github.io/AdaptivePushing/.
CaMP: Causal Multi-policy Planning for Interactive Navigation in Multi-room Scenes
Visual navigation has been widely studied under the assumption that there may be several clear routes to reach the goal. Interactive Navigation (InterNav) considers agents navigating to their goals more effectively with object interactions, posing new challenges of learning interaction dynamics and extra action space. Previous works learn single vision-to-action policy with the guidance of designed representations. However, the causality between actions and outcomes is prone to be confounded when the attributes of obstacles are diverse and hard to measure. Learning policy for long-term action planning in complex scenes also leads to extensive inefficient exploration. In this paper, we introduce a causal diagram of InterNav clarifying the confounding bias caused by obstacles.
IN-Sight: Interactive Navigation through Sight
Schoch, Philipp, Yang, Fan, Ma, Yuntao, Leutenegger, Stefan, Hutter, Marco, Leboutet, Quentin
Current visual navigation systems often treat the environment as static, lacking the ability to adaptively interact with obstacles. This limitation leads to navigation failure when encountering unavoidable obstructions. In response, we introduce IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles. Utilizing RGB-D observations, IN-Sight calculates traversability scores and incorporates them into a semantic map, facilitating long-range path planning in complex, maze-like environments. To precisely navigate around obstacles, IN-Sight employs a local planner, trained imperatively on a differentiable costmap using representation learning techniques. The entire framework undergoes end-to-end training within the state-of-the-art photorealistic Intel SPEAR Simulator. We validate the effectiveness of IN-Sight through extensive benchmarking in a variety of simulated scenarios and ablation studies. Moreover, we demonstrate the system's real-world applicability with zero-shot sim-to-real transfer, deploying our planner on the legged robot platform ANYmal, showcasing its practical potential for interactive navigation in real environments.
Interactive-FAR:Interactive, Fast and Adaptable Routing for Navigation Among Movable Obstacles in Complex Unknown Environments
He, Botao, Chen, Guofei, Wang, Wenshan, Zhang, Ji, Fermuller, Cornelia, Aloimonos, Yiannis
This paper introduces a real-time algorithm for navigating complex unknown environments cluttered with movable obstacles. Our algorithm achieves fast, adaptable routing by actively attempting to manipulate obstacles during path planning and adjusting the global plan from sensor feedback. The main contributions include an improved dynamic Directed Visibility Graph (DV-graph) for rapid global path searching, a real-time interaction planning method that adapts online from new sensory perceptions, and a comprehensive framework designed for interactive navigation in complex unknown or partially known environments. Our algorithm is capable of replanning the global path in several milliseconds. It can also attempt to move obstacles, update their affordances, and adapt strategies accordingly. Extensive experiments validate that our algorithm reduces the travel time by 33%, achieves up to 49% higher path efficiency, and runs faster than traditional methods by orders of magnitude in complex environments. It has been demonstrated to be the most efficient solution in terms of speed and efficiency for interactive navigation in environments of such complexity. We also open-source our code in the docker demo to facilitate future research.
Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models
Zhang, Zhen, Lin, Anran, Wong, Chun Wai, Chu, Xiangyu, Dou, Qi, Au, K. W. Samuel
This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we can achieve an end-to-end system from textual instructions like "Can you pass through the curtains to deliver medicines to me?", to bounding boxes (e.g., curtains) with action-aware attributes. They can be used to segment LiDAR point clouds into two parts: traversable and untraversable parts, and then an action-aware costmap is constructed for generating a feasible path. The pre-trained large models have great generalization ability and do not require additional annotated data for training, allowing fast deployment in the interactive navigation tasks. We choose to use multiple traversable objects such as curtains and grasses for verification by instructing the robot to traverse them. Besides, traversing curtains in a medical scenario was tested. All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments.
Target Reaching Behaviour for Unfreezing the Robot in a Semi-Static and Crowded Environment
Robot navigation in human semi-static and crowded environments can lead to the freezing problem, where the robot can not move due to the presence of humans standing on its path and no other path is available. Classical approaches of robot navigation do not provide a solution for this problem. In such situations, the robot could interact with the humans in order to clear its path instead of considering them as unanimated obstacles. In this work, we propose a robot behavior for a wheeled humanoid robot that complains with social norms for clearing its path when the robot is frozen due to the presence of humans. The behavior consists of two modules: 1) A detection module, which make use of the Yolo v3 algorithm trained to detect human hands and human arms. 2) A gesture module, which make use of a policy trained in simulation using the Proximal Policy Optimization algorithm. Orchestration of the two models is done using the ROS framework.
Interactive Gibson: A Benchmark for Interactive Navigation in Cluttered Environments
Xia, Fei, Shen, William B., Li, Chengshu, Kasimbeg, Priya, Tchapmi, Micael, Toshev, Alexander, Martín-Martín, Roberto, Savarese, Silvio
-- We present Interactive Gibson, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. For example, the robot can move objects if needed in order to clear a path leading to the goal location. Our benchmark comprises two novel elements: 1) a new experimental setup, the Interactive Gibson Environment, which simulates high fidelity visuals of indoor scenes, and high fidelity physical dynamics of the robot and common objects found in these scenes; 2) a set of Interactive Navigation metrics which allows one to study the interplay between navigation and physical interaction. We present and evaluate multiple learning-based baselines in Interactive Gibson, and provide insights into regimes of navigation with different tradeoffs between navigation path efficiency and disturbance of surrounding objects. Classical robot navigation is concerned with reaching goals while avoiding collisions [1], [2]. This definition of navigation is motivated by a wide variety of robot applications in factories or outdoor settings. As robots are increasingly deployed in complex and cluttered environments, physical interactions while navigating become not only unavoidable, but necessary. For example, when operating a robot in a cluttered home, the robot might need to push objects aside or open doors in order to be able to reach its destination. This problem is referred to as Interactive Navigation and in this paper we propose a principled and systematic way to study it (see Figure 1). The "aversion to interaction" in robot mobile agents is easy to understand: real robots are expensive, and interacting with the environment presents safety risks. In Robotic Manipulation these challenges have been addressed by extensive use of physics simulation engines [3], [4], [5], which simulate object and robot dynamics with high precision and thus allow one to study manipulation in a safe manner. Further, these engines can be used to train models which are deployable in the real world.
HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators
Li, Chengshu, Xia, Fei, Martin-Martin, Roberto, Savarese, Silvio
Most common navigation tasks in human environments require auxiliary arm interactions, e.g. opening doors, pressing buttons and pushing obstacles away. This type of navigation tasks, which we call Interactive Navigation, requires the use of mobile manipulators: mobile bases with manipulation capabilities. Interactive Navigation tasks are usually long-horizon and composed of heterogeneous phases of pure navigation, pure manipulation, and their combination. Using the wrong part of the embodiment is inefficient and hinders progress. We propose HRL4IN, a novel Hierarchical RL architecture for Interactive Navigation tasks. HRL4IN exploits the exploration benefits of HRL over flat RL for long-horizon tasks thanks to temporally extended commitments towards subgoals. Different from other HRL solutions, HRL4IN handles the heterogeneous nature of the Interactive Navigation task by creating subgoals in different spaces in different phases of the task. Moreover, HRL4IN selects different parts of the embodiment to use for each phase, improving energy efficiency. We evaluate HRL4IN against flat PPO and HAC, a state-of-the-art HRL algorithm, on Interactive Navigation in two environments - a 2D grid-world environment and a 3D environment with physics simulation. We show that HRL4IN significantly outperforms its baselines in terms of task performance and energy efficiency. More information is available at https://sites.google.com/view/hrl4in.