Goto

Collaborating Authors

 uav


AGC-Drive: ALarge-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios

Neural Information Processing Systems

By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of highquality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception.


HiFTTCTrack AVTrack Ours Ground Truth

Neural Information Processing Systems

UAV tracking can be widely applied in scenarios such as disaster rescue, environmental monitoring, and logistics transportation. However, existing UAV tracking methods predominantly emphasize speed and lack exploration in semantic awareness, which hinders the search region from extracting accurate localization information from the template.


New research enables a robot to chart a better course

Robohub

In the aftermath of a devastating earthquake, unpiloted aerial vehicles (UAVs) could fly through a collapsed building to map the scene, giving rescuers information they need to quickly reach survivors. But this remains an extremely challenging problem for an autonomous robot, which would need to swiftly adjust its trajectory to avoid sudden obstacles while staying on course. Researchers from MIT and the University of Pennsylvania developed a new trajectory-planning system that tackles both challenges at once. Their technique enables a UAV to react to obstacles in milliseconds while staying on a smooth flight path that minimizes travel time. Their system uses a new mathematical formulation that ensures the robot travels safely to its destination along a feasible path, and that is less computationally intensive than other techniques.


Multi-Task Bayesian Optimization for Tuning Decentralized Trajectory Generation in Multi-UAV Systems

arXiv.org Artificial Intelligence

We treat each task as a trajectory generation scenario defined by a specific number of drone-to-drone interactions. To model relationships across scenarios, we employ Multi-Task Gaussian Processes, which capture shared structure across tasks and enable efficient information transfer during optimization. We compare two strategies: optimizing the average mission time across all tasks and optimizing each task individually. Through a comprehensive simulation campaign, we show that single-task optimization leads to progressively shorter mission times as swarm size grows, but requires significantly more optimization time than the average-task approach. Keywords: Multi-Task Bayesian Optimization; Gaussian Processes; Multi-agent systems; UAV; Trajectory generation 1. INTRODUCTION In recent years, research efforts and real-world applications of Unmanned Aerial Vehicles (UAVs) have increasingly shifted from single-agent to multi-agent systems.


Chat with UAV -- Human-UAV Interaction Based on Large Language Models

arXiv.org Artificial Intelligence

The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human-UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual-agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.


IGUANA: Immersive Guidance, Navigation, and Control for Consumer UAV

arXiv.org Artificial Intelligence

As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.


BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVs

arXiv.org Artificial Intelligence

With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at https://github.com/lostwolves/BEDI.


Optimal Safety-Aware Scheduling for Multi-Agent Aerial 3D Printing with Utility Maximization under Dependency Constraints

arXiv.org Artificial Intelligence

Abstract--This article presents a novel coordination and task-planning framework to enable the simultaneous conflict-free collaboration of multiple unmanned aerial vehicles (UA Vs) for aerial 3D printing. The proposed framework formulates an optimization problem that takes a construction mission divided into sub-tasks and a team of autonomous UA Vs, along with limited volume and battery. It generates an optimal mission plan comprising task assignments and scheduling, while accounting for task dependencies arising from the geometric and structural requirements of the 3D design, inter-UA V safety constraints, material usage and total flight time of each UA V. The potential conflicts occurring during the simultaneous operation of the UA Vs are addressed at a segment-level by dynamically selecting the starting time and location of each task to guarantee collision-free parallel execution. An importance prioritization is proposed to accelerate the computation by guiding the solution towards more important tasks. Additionally, a utility maximization formulation is proposed to dynamically determine the optimal number of UA Vs required for a given mission, balancing the trade-off between minimizing makespan and the deployment of excess agents. The proposed framework's effectiveness is evaluated through a Gazebo-based simulation setup, where agents are coordinated by a mission control module allocating the printing tasks based on the generated optimal scheduling plan while remaining within the material and battery constraints of each UA V. A video of the whole mission is available in the following link: https://youtu.be/b4jwhkNPT Note to Practitioners--This framework addresses the critical need for efficiency and safety in planning and scheduling multiple aerial robots for parallel aerial 3D printing. Existing approaches lack safety guarantees for UA Vs during parallel construction. This work tackles these challenges by ensuring safety during parallel operations and effectively managing task dependencies.


Mobility Induced Sensitivity of UAV based Nodes to Jamming in Private 5G Airfield Networks An Experimental Study

arXiv.org Artificial Intelligence

This work presents an e xperimental performance evaluation of a p rivate 5G a irfield n etwork under controlled directional SDR jamming attacks targeting UAV - based UE nodes . Using a QualiPoc Android UE, mounted as a payload on a quad-copter UAV, we conducted a series of experiments to evaluate signal degradation, handover performance, and service stability in the presence of constant directional jamming. The conducted experiments aimed to examin e the effe c t s of varying travel speed s, altitudes, and moving patterns of a UAV - based UE to record and analyze the key physical - layer and network - layer metrics such as CQI, MCS, RSRP, SINR, BLER, Net PDSCH Throughput and RLF. The results of this work describe the link stability and signal degradation dependencies, caused by the level of mobility of the UAV - based UE nodes during autonomous and automatic operation in private 5G Airfield networks.


Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning

arXiv.org Artificial Intelligence

Unmanned Aerial Vehicles (UAVs) are increasingly used in defense, surveillance, and disaster response, yet most systems still operate at SAE Level 2 to 3 autonomy. Their dependence on rule-based control and narrow AI limits adaptability in dynamic and uncertain missions. Current UAV architectures lack context-aware reasoning, autonomous decision-making, and integration with external systems. Importantly, none make use of Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture consisting of Perception, Reasoning, Action, Integration, and Learning. The framework enhances UAV autonomy through LLM-driven reasoning, database querying, and interaction with third-party systems. A prototype built with ROS 2 and Gazebo combines YOLOv11 for object detection with GPT-4 for reasoning and a locally deployed Gemma 3 model. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 compared to 0.72), improved person detection rates (91% compared to 75%), and a major increase in correct action recommendations (92% compared to 4.5%). These results show that modest computational overhead can enable significantly higher levels of autonomy and system-level integration.