Drones
UN warns of potential 'ethnically driven' atrocities in Sudan's el-Fasher
UN warns of potential'ethnically driven' atrocities in Sudan's el-Fasher At least 91 people have been killed in Sudan's besieged city of el-Fasher in attacks by the paramilitary Rapid Support Forces (RSF) over 10 days last month, the United Nations says. The attacks took place during intensified fighting between the RSF and Sudan's army around the city, the largest urban centre in the Darfur region that remains under the control of the military and its allies, known as the Joint Forces. UN rights chief Volker Turk said on Thursday that the city's Daraja Oula neighbourhood was repeatedly attacked and subjected to RSF artillery shelling, drone strikes and ground incursions from September 19 to 29. He called for urgent action to prevent "large-scale, ethnically driven attacks and atrocities in el-Fasher." He said "atrocities are not inevitable", adding that "they can be averted if all actors take concrete action to uphold international law, demand respect for civilian life and property, and prevent the continued commission of atrocity crimes".
A Hierarchical Agentic Framework for Autonomous Drone-Based Visual Inspection
Herron, Ethan, Lee, Xian Yeow, Sin, Gregory, Diaz, Teresa Gonzalez, Farahat, Ahmed, Gupta, Chetan
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, our contributions are two-fold: first, we propose a hierarchical agentic framework for autonomous drone control, and second, a reasoning methodology for individual function executions which we refer to as ReActEval. Our framework focuses on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. It employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while worker agents implement ReActEval to reason over and execute low-level actions. Operating entirely in natural language, ReActEval follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level tasks (e.g., locating and reading a pressure gauge). The evaluation phase serves as a feedback and/or replanning stage, ensuring actions align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying complexity levels and workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling autonomous problem-solving for industrial inspection without extensive user intervention.
Robust Attitude Control of Nonlinear Multi-Rotor Dynamics with LFT Models and $\mathcal{H}_\infty$ Performance
Kumar, Tanay, Bhattacharya, Raktim
Attitude stabilization of unmanned aerial vehicles in uncertain environments presents significant challenges due to nonlinear dynamics, parameter variations, and sensor limitations. This paper presents a comparative study of $\mathcal{H}_\infty$ and classical PID controllers for multi-rotor attitude regulation in the presence of wind disturbances and gyroscope noise. The flight dynamics are modeled using a linear parameter-varying (LPV) framework, where nonlinearities and parameter variations are systematically represented as structured uncertainties within a linear fractional transformation formulation. A robust controller based on $\mathcal{H}_\infty$ formulation is designed using only gyroscope measurements to ensure guaranteed performance bounds. Nonlinear simulation results demonstrate the effectiveness of the robust controllers compared to classical PID control, showing significant improvement in attitude regulation under severe wind disturbances.
CHAI: Command Hijacking against embodied AI
Burbano, Luis, Ortiz, Diego, Sun, Qi, Yang, Siwei, Tu, Haoqin, Xie, Cihang, Cao, Yinzhi, Cardenas, Alvaro A
Embodied Artificial Intelligence (AI) promises to handle edge cases in robotic vehicle systems where data is scarce by using common-sense reasoning grounded in perception and action to generalize beyond training distributions and adapt to novel real-world situations. These capabilities, however, also create new security risks. In this paper, we introduce CHAI (Command Hijacking against embodied AI), a new class of prompt-based attacks that exploit the multimodal language interpretation abilities of Large Visual-Language Models (LVLMs). CHAI embeds deceptive natural language instructions, such as misleading signs, in visual input, systematically searches the token space, builds a dictionary of prompts, and guides an attacker model to generate Visual Attack Prompts. We evaluate CHAI on four LVLM agents; drone emergency landing, autonomous driving, and aerial object tracking, and on a real robotic vehicle. Our experiments show that CHAI consistently outperforms state-of-the-art attacks. By exploiting the semantic and multimodal reasoning strengths of next-generation embodied AI systems, CHAI underscores the urgent need for defenses that extend beyond traditional adversarial robustness.
Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI
Barbosa, Diego Ortiz, Agrawal, Mohit, Malegaonkar, Yash, Burbano, Luis, Andersson, Axel, Dรกn, Gyรถrgy, Sandberg, Henrik, Cardenas, Alvaro A.
Autonomous drones must often respond to sudden events, such as alarms, faults, or unexpected changes in their environment, that require immediate and adaptive decision-making. Traditional approaches rely on safety engineers hand-coding large sets of recovery rules, but this strategy cannot anticipate the vast range of real-world contingencies and quickly becomes incomplete. Recent advances in embodied AI, powered by large visual language models, provide commonsense reasoning to assess context and generate appropriate actions in real time. We demonstrate this capability in a simulated urban benchmark in the Unreal Engine, where drones dynamically interpret their surroundings and decide on sudden maneuvers for safe landings. Our results show that embodied AI makes possible a new class of adaptive recovery and decision-making pipelines that were previously infeasible to design by hand, advancing resilience and safety in autonomous aerial systems.
Learning to Lead Themselves: Agentic AI in MAS using MARL
As autonomous systems move from prototypes to real deployments, the ability of multiple agents to make decentralized, cooperative decisions becomes a core requirement. This paper examines how agentic artificial intelligence, agents that act independently, adaptively and proactively can improve task allocation and coordination in multi-agent systems, with primary emphasis on drone delivery and secondary relevance to warehouse automation. We formulate the problem in a cooperative multi-agent reinforcement learning setting and implement a lightweight multi-agent Proximal Policy Optimization, called IPPO, approach in PyTorch under a centralized-training, decentralized-execution paradigm. Experiments are conducted in PettingZoo environment, where multiple homogeneous drones or agents must self-organize to cover distinct targets without explicit communication.
TDBench: A Benchmark for Top-Down Image Understanding with Reliability Analysis of Vision-Language Models
Hou, Kaiyuan, Zhao, Minghui, Xu, Lilin, Fan, Yuang, Jiang, Xiaofan
Top-down images play an important role in safety-critical settings such as autonomous navigation and aerial surveillance, where they provide holistic spatial information that front-view images cannot capture. Despite this, Vision Language Models (VLMs) are mostly trained and evaluated on front-view benchmarks, leaving their performance in the top-down setting poorly understood. Existing evaluations also overlook a unique property of top-down images: their physical meaning is preserved under rotation. In addition, conventional accuracy metrics can be misleading, since they are often inflated by hallucinations or "lucky guesses", which obscures a model's true reliability and its grounding in visual evidence. To address these issues, we introduce TDBench, a benchmark for top-down image understanding that includes 2000 curated questions for each rotation. We further propose RotationalEval (RE), which measures whether models provide consistent answers across four rotated views of the same scene, and we develop a reliability framework that separates genuine knowledge from chance. Finally, we conduct four case studies targeting underexplored real-world challenges. By combining rigorous evaluation with reliability metrics, TDBench not only benchmarks VLMs in top-down perception but also provides a new perspective on trustworthiness, guiding the development of more robust and grounded AI systems. Project homepage: https://github.com/Columbia-ICSL/TDBench
ROSplane 2.0: A Fixed-Wing Autopilot for Research
Reid, Ian, Ritchie, Joseph, Moore, Jacob, Sutherland, Brandon, Snow, Gabe, Tokumaru, Phillip, McLain, Tim
Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Powered by ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring expensive system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.
ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial Vehicles
Moore, Jacob, Tokumaru, Phil, Reid, Ian, Sutherland, Brandon, Ritchie, Joseph, Snow, Gabe, McLain, Tim
ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial V ehicles Abstract-- ROSflight is a lean, open-source autopilot ecosystem for unmanned aerial vehicles (UA Vs). Designed by researchers for researchers, it is built to lower the barrier to entry to UA V research and accelerate the transition from simulation to hardware experiments by maintaining a lean (not full-featured), well-documented, and modular codebase. This publication builds on previous treatments and describes significant additions to the architecture that improve the modularity and usability of ROSflight, including the transition from ROS 1 to ROS 2, supported hardware, low-level actuator mixing, and the simulation environment. We believe that these changes improve the usability of ROSflight and enable ROSflight to accelerate research in areas like advanced-air mobility. Hardware results are provided, showing that ROSflight is able to control a multirotor over a serial connection at 400 Hz while closing all control loops on the companion computer . In recent years, interest in unmanned aerial vehicles (UA Vs) has increased significantly. Technological advances have enabled numerous applications of UA Vs, including package delivery, photography, search-and-rescue, firefighting, as well as military applications. Advanced air mobility (AAM), a category broadly referring to increasing autonomy in urban areas for civilian use, is also currently an area of high interest.
Enabling High-Frequency Cross-Modality Visual Positioning Service for Accurate Drone Landing
Wang, Haoyang, Luo, Xinyu, Ding, Wenhua, Xu, Jingao, Chen, Xuecheng, Duan, Ruiyang, Chen, Jialong, Zhang, Haitao, Liu, Yunhao, Chen, Xinlei
After years of growth, drone-based delivery is transforming logistics. At its core, real-time 6-DoF drone pose tracking enables precise flight control and accurate drone landing. With the widespread availability of urban 3D maps, the Visual Positioning Service (VPS), a mobile pose estimation system, has been adapted to enhance drone pose tracking during the landing phase, as conventional systems like GPS are unreliable in urban environments due to signal attenuation and multi-path propagation. However, deploying the current VPS on drones faces limitations in both estimation accuracy and efficiency. In this work, we redesign drone-oriented VPS with the event camera and introduce EV-Pose to enable accurate, high-frequency 6-DoF pose tracking for accurate drone landing. EV-Pose introduces a spatio-temporal feature-instructed pose estimation module that extracts a temporal distance field to enable 3D point map matching for pose estimation; and a motion-aware hierarchical fusion and optimization scheme to enhance the above estimation in accuracy and efficiency, by utilizing drone motion in the \textit{early stage} of event filtering and the \textit{later stage} of pose optimization. Evaluation shows that EV-Pose achieves a rotation accuracy of 1.34$\degree$ and a translation accuracy of 6.9$mm$ with a tracking latency of 10.08$ms$, outperforming baselines by $>$50\%, \tmcrevise{thus enabling accurate drone landings.} Demo: https://ev-pose.github.io/