route description
Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models
Ryu, Chanhoe, Seong, Hyunki, Lee, Daegyu, Moon, Seongwoo, Min, Sungjae, Shim, D. Hyunchul
This paper introduces an innovative application of foundation models, enabling Unmanned Ground Vehicles (UGVs) equipped with an RGB-D camera to navigate to designated destinations based on human language instructions. Unlike learning-based methods, this approach does not require prior training but instead leverages existing foundation models, thus facilitating generalization to novel environments. Upon receiving human language instructions, these are transformed into a 'cognitive route description' using a large language model (LLM)-a detailed navigation route expressed in human language. The vehicle then decomposes this description into landmarks and navigation maneuvers. The vehicle also determines elevation costs and identifies navigability levels of different regions through a terrain segmentation model, GANav, trained on open datasets. Semantic elevation costs, which take both elevation and navigability levels into account, are estimated and provided to the Model Predictive Path Integral (MPPI) planner, responsible for local path planning. Concurrently, the vehicle searches for target landmarks using foundation models, including YOLO-World and EfficientViT-SAM. Ultimately, the vehicle executes the navigation commands to reach the designated destination, the final landmark. Our experiments demonstrate that this application successfully guides UGVs to their destinations following human language instructions in novel environments, such as unfamiliar terrain or urban settings.
- North America > United States (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (1.00)
- Overview > Innovation (0.34)
- Transportation > Ground > Road (0.51)
- Information Technology > Robotics & Automation (0.51)
- Automobiles & Trucks (0.51)
How Much is too Much: Exploring the Effect of Verbal Route Description Length on Indoor Navigation
N, Fathima Nourin, Pramanick, Pradip, Sarkar, Chayan
Navigating through a new indoor environment can be stressful. Recently, many places have deployed robots to assist visitors. One of the features of such robots is escorting the visitors to their desired destination within the environment, but this is neither scalable nor necessary for every visitor. Instead, a robot assistant could be deployed at a strategic location to provide wayfinding instructions. This not only increases the user experience but can be helpful in many time-critical scenarios e.g., escorting someone to their boarding gate at an airport. However, delivering route descriptions verbally poses a challenge. If the description is too verbose, people may struggle to recall all the information, while overly brief descriptions may be simply unhelpful. This article focuses on studying the optimal length of verbal route descriptions that are effective for reaching the destination and easy for people to recall. This work proposes a theoretical framework that links route segments to chunks in working memory. Based on this framework, an experiment is designed and conducted to examine the effects of route descriptions of different lengths on navigational performance. The results revealed intriguing patterns suggesting an ideal length of four route segments. This study lays a foundation for future research exploring the relationship between route description lengths, working memory capacity, and navigational performance in indoor environments.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.94)
Lana: A Language-Capable Navigator for Instruction Following and Generation
Wang, Xiaohan, Wang, Wenguan, Shao, Jiayi, Yang, Yi
Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance. However, existing literature put most emphasis on interpreting instructions into actions, only delivering "dumb" wayfinding agents. In this article, we devise LANA, a language-capable navigation agent which is able to not only execute human-written navigation commands, but also provide route descriptions to humans. This is achieved by simultaneously learning instruction following and generation with only one single model. More specifically, two encoders, respectively for route and language encoding, are built and shared by two decoders, respectively, for action prediction and instruction generation, so as to exploit cross-task knowledge and capture task-specific characteristics. Throughout pretraining and fine-tuning, both instruction following and generation are set as optimization objectives. We empirically verify that, compared with recent advanced task-specific solutions, LANA attains better performances on both instruction following and route description, with nearly half complexity. In addition, endowed with language generation capability, LANA can explain to humans its behaviors and assist human's wayfinding. This work is expected to foster future efforts towards building more trustworthy and socially-intelligent navigation robots.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
On Qualitative Route Descriptions: Representation and Computational Complexity
Westphal, Matthias (University of Freiburg) | Wölfl, Stefan (University of Freiburg) | Nebel, Bernhard (University of Freiburg) | Renz, Jochen (The Australian National University)
The generation of route descriptions is a fundamental task of navigation systems. A particular problem in this context is to identify routes that can easily be described and processed by users. In this work, we present a framework for representing route networks with the qualitative information necessary to evaluate and optimize route descriptions with regard to ambiguities in them. We identify different agent models that differ in how agents are assumed to process route descriptions while navigating through route networks. Further, we analyze the computational complexity of matching route descriptions and paths in route networks in dependency of the agent model. Finally we empirically evaluate the influence of the agent model on the optimization and the processing of route instructions.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)