Taking Flight with Dialogue: Enabling Natural Language Control for PX4-based Drone Agent
Lim, Shoon Kit, Chong, Melissa Jia Ying, Khor, Jing Huey, Ling, Ting Yang
–arXiv.org Artificial Intelligence
--Recent advances in agentic and physical Artificial Intelligence (AI) have largely focused on ground-based platforms--such as humanoid and wheeled robots--leaving aerial robots relatively underexplored. At the same time, state-of-the-art UA V multimodal vision-language systems typically depend on closed-source models accessible only to well-resourced organizations. T o democratize natural language control of autonomous drones, an open-source agentic framework is presented that integrates PX4-based flight control, Robot Operating System 2 (ROS2) middleware, and locally hosted models using Ollama. Performance is evaluated both in simulation and on a custom quadcopter platform, benchmarking four Large Language Model (LLM) families for command generation and three Vision Language Model (VLM) families for scene understanding. Results indicate that the LLMs, specifically Gemma3, Qwen2.5, and Llama-3.2, consistently produced 100% valid flight commands, while DeepSeek-LLM demonstrated significantly lower performance at 38%. Additionally, all VLMs assessed, including Gemma3, Llama3.2-Vision, and Llava1.6, are able to detect the presence of specified objects and give valid binary responses ranging from 97% to 100%.
arXiv.org Artificial Intelligence
Jun-10-2025
- Country:
- Asia
- Malaysia > Johor
- Johor Darul Takim (0.05)
- Middle East > UAE
- Dubai Emirate > Dubai (0.04)
- Malaysia > Johor
- Europe > United Kingdom
- England > Greater London > London (0.04)
- Asia
- Genre:
- Research Report (0.65)
- Industry:
- Information Technology (0.94)
- Technology: