Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization

Li, Yang, Zhang, Ruichen, Liu, Yinqiu, Liu, Guangyuan, Niyato, Dusit, Jamalipour, Abbas, Wang, Xianbin, Kim, Dong In

arXiv.org Artificial Intelligence 

Abstract--The rapid advancement of Low-Altitude Economy Networks (LAENets) has enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection. T o support these scenarios, unmanned aerial vehicles (UA Vs) equipped with onboard vision-language models (VLMs) offer a promising solution for real-time multimodal inference. However, ensuring both inference accuracy and communication efficiency remains a significant challenge due to limited onboard resources and dynamic network conditions. In this paper, we first propose a UA V-enabled LAENet system model that jointly captures UA V mobility, user-UA V communication, and the onboard visual question answering (VQA) pipeline. Based on this model, we formulate a mixed-integer non-convex optimization problem to minimize task latency and power consumption under user-specific accuracy constraints. T o solve the problem, we design a hierarchical optimization framework composed of two parts: (i) an Alternating Resolution and Power Optimization (ARPO) algorithm for resource allocation under accuracy constraints, and (ii) a Large Language Model-augmented Reinforcement Learning Approach (LLaRA) for adaptive UA V trajectory optimization. The large language model (LLM) serves as an expert in refining reward design of reinforcement learning in an offline fashion, introducing no additional latency in real-time decision-making. Numerical results demonstrate the efficacy of our proposed framework in improving inference performance and communication efficiency under dynamic LAENet conditions. Low-Altitude Economy Networks (LAENets) have recently garnered growing attention as a novel paradigm that leverages the low-altitude airspace (typically below 1000 meters) to deliver digital services [1]. Li and G. Liu are with the College of Computing and Data Science, the Energy Research Institute @ NTU, Interdisciplinary Graduate Program, Nanyang Technological University, Singapore (e-mail: yang048@e.ntu.edu.sg; Liu and D. Niyato are with the College of Computing and Data Science, Nanyang Technological University, Singapore (e-mails: ruichen.zhang@ntu.edu.sg; X. Wang is with the Department of Electrical and Computer Engineering, Western University, London, Canada (e-mail: xianbin.wang@uwo.ca).