AirScape: An Aerial Generative World Model with Motion Controllability

Zhao, Baining, Tang, Rongze, Jia, Mingyuan, Wang, Ziyou, Man, Fanghang, Zhang, Xin, Shang, Yu, Zhang, Weichen, Wu, Wei, Gao, Chen, Chen, Xinlei, Li, Yong

Oct-13-2025–arXiv.org Artificial Intelligence

How to enable agents to predict the outcomes of their own motion intentions in three-dimensional space has been a fundamental problem in embodied intelligence. To explore general spatial imagination capability, we present AirScape, the first world model designed for six-degree-of-freedom aerial agents. AirScape predicts future observation sequences based on current visual inputs and motion intentions. Specifically, we construct a dataset for aerial world model training and testing, which consists of 11k video-intention pairs. This dataset includes first-person-view videos capturing diverse drone actions across a wide range of scenarios, with over 1,000 hours spent annotating the corresponding motion intentions. Then we develop a two-phase schedule to train a foundation model--initially devoid of embodied spatial knowledge--into a world model that is controllable by motion intentions and adheres to physical spatio-temporal constraints. Experimental results demonstrate that AirScape significantly outperforms existing foundation models in 3D spatial imagination capabilities, especially with over a 50% improvement in metrics reflecting motion alignment. The project is available at: https://embodiedcity.github.io/AirScape/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-13-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.47)
- Asia > China
  - Guangdong Province (0.15)

Genre:
- Research Report (0.84)

Industry:
- Information Technology (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Spatial Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Cognitive Science > Problem Solving (1.00)
  - Robots > Autonomous Vehicles
    - Drones (0.46)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found