AITopics | navigation instruction

Similartoaudio-visual navigationtasks,thegoalofourembodied agentistolocalize anaudioeventvia navigating the 3D visual world; however, the agent may also seek help from a human (oracle), where the assistance is provided in free-form natural language.

artificial intelligence, instruction, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Mind Eye of LLMs: Visualization of Thought Elicits

Neural Information Processing SystemsOct-10-2025, 12:08:36 GMT

However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored.

instruction, reasoning, visualization, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Montserrat (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

Yang, Haolin, Long, Yuxing, Yu, Zhuoyuan, Yang, Zihan, Wang, Minghan, Xu, Jiapeng, Wang, Yihan, Yu, Ziyan, Cai, Wenzhe, Kang, Lei, Dong, Hao

arXiv.org Artificial IntelligenceOct-10-2025

Instruction-following navigation is a key step toward embodied intelligence. Prior benchmarks mainly focus on semantic understanding but overlook systematically evaluating navigation agents' spatial perception and reasoning capabilities. In this work, we introduce the NavSpace benchmark, which contains six task categories and 1,228 trajectory-instruction pairs designed to probe the spatial intelligence of navigation agents. On this benchmark, we comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models. The evaluation results lift the veil on spatial intelligence in embodied navigation. Furthermore, we propose SNav, a new spatially intelligent navigation model. SNav outperforms existing navigation agents on NavSpace and real robot tests, establishing a strong baseline for future work.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.08173

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

Neural Information Processing SystemsOct-3-2025, 02:51:00 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing (0.93)

Add feedback

Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

Ma, Tianyi, Zhang, Yue, Wang, Zehao, Kordjamshidi, Parisa

arXiv.org Artificial IntelligenceOct-2-2025

Vision-and-Language Navigation (VLN) poses significant challenges for agents to interpret natural language instructions and navigate complex 3D environments. While recent progress has been driven by large-scale pre-training and data augmentation, current methods still struggle to generalize to unseen scenarios, particularly when complex spatial and temporal reasoning is required. In this work, we propose Skill-Nav, a modular framework that introduces structured, skill-based reasoning into Transformer-based VLN agents. Our method decomposes navigation into a set of interpretable atomic skills (e.g., V ertical Movement, Area and Region Identification, Stop and Pause), each handled by a specialized agent. To support targeted skill training without manual data annotation, we construct a synthetic dataset pipeline that generates diverse, linguistically natural, skill-specific instruction-trajectory pairs. We then introduce a novel training-free Vision-Language Model (VLM)-based router, which dynamically selects the most suitable agent at each time step by aligning sub-goals with visual observations and historical actions. SkillNav obtains competitive results on commonly-used benchmarks, and establishes state-of-the-art generalization to the GSA-R2R, a benchmark with novel instruction styles and unseen environments.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.07642

Country: North America > United States (0.28)

Genre:

Research Report (1.00)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route

Luo, Hongyi, Cheng, Qing, Matos, Daniel, Gadi, Hari Krishna, Zhang, Yanfeng, Liu, Lu, Wang, Yongliang, Zeller, Niclas, Cremers, Daniel, Meng, Liqiu

arXiv.org Artificial IntelligenceSep-24-2025

Humans can interpret geospatial information through natural language, while the geospatial cognition capabilities of Large Language Models (LLMs) remain underexplored. Prior research in this domain has been constrained by non-quantifiable metrics, limited evaluation datasets and unclear research hierarchies. Therefore, we propose a large-scale benchmark and conduct a comprehensive evaluation of the geospatial route cognition of LLMs. We create a large-scale evaluation dataset comprised of 36000 routes from 12 metropolises worldwide. Then, we introduce PathBuilder, a novel tool for converting natural language instructions into navigation routes, and vice versa, bridging the gap between geospatial information and natural language. Finally, we propose a new evaluation framework and metrics to rigorously assess 11 state-of-the-art (SOTA) LLMs on the task of route reversal. The benchmark reveals that LLMs exhibit limitation to reverse routes: most reverse routes neither return to the starting point nor are similar to the optimal route. Additionally, LLMs face challenges such as low robustness in route generation and high confidence for their incorrect answers. Code\ \&\ Data available here: \href{https://github.com/bghjmn32/EMNLP2025_Turnback}{TurnBack.}

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.18173

Country: