Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning

Open in new window