Goto

Collaborating Authors

 Drones


RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour

arXiv.org Artificial Intelligence

RaceVLA presents an innovative approach for autonomous racing drone navigation by leveraging Visual-Language-Action (VLA) to emulate human-like behavior. This research explores the integration of advanced algorithms that enable drones to adapt their navigation strategies based on real-time environmental feedback, mimicking the decision-making processes of human pilots. The model, fine-tuned on a collected racing drone dataset, demonstrates strong generalization despite the complexity of drone racing environments. RaceVLA outperforms OpenVLA in motion (75.0 vs 60.0) and semantic generalization (45.5 vs 36.3), benefiting from the dynamic camera and simplified motion tasks. However, visual (79.6 vs 87.0) and physical (50.0 vs 76.7) generalization were slightly reduced due to the challenges of maneuvering in dynamic environments with varying object sizes. RaceVLA also outperforms RT-2 across all axes - visual (79.6 vs 52.0), motion (75.0 vs 55.0), physical (50.0 vs 26.7), and semantic (45.5 vs 38.8), demonstrating its robustness for real-time adjustments in complex environments. Experiments revealed an average velocity of 1.04 m/s, with a maximum speed of 2.02 m/s, and consistent maneuverability, demonstrating RaceVLA's ability to handle high-speed scenarios effectively. These findings highlight the potential of RaceVLA for high-performance navigation in competitive racing contexts. The RaceVLA codebase, pretrained weights, and dataset are available at this http URL: https://racevla.github.io/


UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

arXiv.org Artificial Intelligence

Abstract--Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Search and rescue (SAR) operations in disaster-stricken and hazardous environments require fast and efficient situational assessment to locate survivors and critical infrastructure.


UAV-VLPA*: A Vision-Language-Path-Action System for Optimal Route Generation on a Large Scales

arXiv.org Artificial Intelligence

The UAV-VLPA* (Visual-Language-Planning-and-Action) system represents a cutting-edge advancement in aerial robotics, designed to enhance communication and operational efficiency for unmanned aerial vehicles (UAVs). By integrating advanced planning capabilities, the system addresses the Traveling Salesman Problem (TSP) to optimize flight paths, reducing the total trajectory length by 18.5\% compared to traditional methods. Additionally, the incorporation of the A* algorithm enables robust obstacle avoidance, ensuring safe and efficient navigation in complex environments. The system leverages satellite imagery processing combined with the Visual Language Model (VLM) and GPT's natural language processing capabilities, allowing users to generate detailed flight plans through simple text commands. This seamless fusion of visual and linguistic analysis empowers precise decision-making and mission planning, making UAV-VLPA* a transformative tool for modern aerial operations. With its unmatched operational efficiency, navigational safety, and user-friendly functionality, UAV-VLPA* sets a new standard in autonomous aerial robotics, paving the way for future innovations in the field.


WalnutData: A UAV Remote Sensing Dataset of Green Walnuts and Model Evaluation

arXiv.org Artificial Intelligence

The UAV technology is gradually maturing and can provide extremely powerful support for smart agriculture and precise monitoring. Currently, there is no dataset related to green walnuts in the field of agricultural computer vision. Thus, in order to promote the algorithm design in the field of agricultural computer vision, we used UAV to collect remote-sensing data from 8 walnut sample plots. Considering that green walnuts are subject to various lighting conditions and occlusion, we constructed a large-scale dataset with a higher-granularity of target features - WalnutData. This dataset contains a total of 30,240 images and 706,208 instances, and there are 4 target categories: being illuminated by frontal light and unoccluded (A1), being backlit and unoccluded (A2), being illuminated by frontal light and occluded (B1), and being backlit and occluded (B2). Subsequently, we evaluated many mainstream algorithms on WalnutData and used these evaluation results as the baseline standard. The dataset and all evaluation results can be obtained at https://github.com/1wuming/WalnutData.


Amazon's Delivery Drones Are Grounded. The Birds and Dogs of This Texas Town Are Grateful

WIRED

As the spring planting season arrives in College Station, Texas, certified master gardener Mark Smith is thrilled that peace is in the air. This time last year, a loud buzzing noise began disrupting Smith's morning routine of checking on the peppers, tomatoes, herbs, and shrubs growing in his backyard. Several times an hour, an Amazon Prime Air delivery drone would noisily emerge about 800 feet away, just past a line of trees behind Smith's home. His neighbors began calling the fleet flying chainsaws. Smith, a retired civil engineer, preferred a different comparison: "It was like your neighbor runs their leaf blower all day long," he says.


Design and Control of A Tilt-Rotor Tailsitter Aircraft with Pivoting VTOL Capability

arXiv.org Artificial Intelligence

-- T ailsitter aircraft attract considerable interest due to their capabilities of both agile hover and high speed forward flight. However, traditional tailsitters that use aerodynamic control surfaces face the challenge of limited control effectiveness and associated actuator saturation during vertical flight and transitions. Conversely, tailsitters relying solely on tilting rotors have the drawback of insufficient roll control authority in forward flight. This paper proposes a tilt-rotor tailsitter aircraft with both elevons and tilting rotors as a promising solution. By implementing a cascaded weighted least squares (WLS) based incremental nonlinear dynamic inversion (INDI) controller, the drone successfully achieved autonomous waypoint tracking in outdoor experiments at a cruise airspeed of 16 m/s, including transitions between forward flight and hover without actuator saturation. Wind tunnel experiments confirm improved roll control compared to tilt-rotor-only configurations, while comparative outdoor flight tests highlight the vehicle's superior control over elevon-only designs during critical phases such as vertical descent and transitions. Finally, we also show that the tilt-rotors allow for an autonomous takeoff and landing with a unique pivoting capability that demonstrates stability and robustness under wind disturbances. Index T erms-- VTOL aircraft, tailsitter UA V, incremental control, tilt rotors, autonomous flight.


Aerial Gym Simulator: A Framework for Highly Parallelized Simulation of Aerial Robots

arXiv.org Artificial Intelligence

ITH increasing deployment in a vast range of applications, including inspection, delivery, and search-and-rescue, aerial robots have gained immense popularity. Multi-rotor systems of varying scales have taken diverse roles and forms ranging from large vehicles with significant payload-carrying capacity to racing micro drones and reconfigurable robots capable of changing their shape actively or passively for traversal [1]-[4] or manipulation [5], [6]. Critically, each unique robot configuration requires addressing embodiment-and task-specific challenges in terms of control, sensing capabilities, perception, and planning. With changes in the number of propellers, structural materials, overall platform size, payloads, the onboard sensor suite, as well as the environment within which a system is expected to operate, autonomy design and optimization need to exploit high-end simulation toward a safer and faster path to resilient deployment.


CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs

arXiv.org Artificial Intelligence

CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive T ask Solving and Reasoning in UA Vs Artem Lykov, V alerii Serpiva, Muhammad Haris Khan, Oleg Sautenkov, Artyom Myshlyaev, Grik Tadevosyan, Y asheerah Y aqoot, and Dzmitry Tsetserukou Abstract -- This paper introduces CognitiveDrone, a novel Vision-Language-Action (VLA) model tailored for complex Unmanned Aerial V ehicles (UA Vs) tasks that demand advanced cognitive abilities. Trained on a dataset comprising over 8,000 simulated flight trajectories across three key categories--Human Recognition, Symbol Understanding, and Reasoning--the model generates real-time 4D action commands based on first-person visual inputs and textual instructions. T o further enhance performance in intricate scenarios, we propose CognitiveDrone-R1, which integrates an additional Vision-Language Model (VLM) reasoning module to simplify task directives prior to high-frequency control. Experimental evaluations using our open-source benchmark, CognitiveDroneBench, reveal that while a racing-oriented model (RaceVLA) achieves an overall success rate of 31.3%, the base CognitiveDrone model reaches 59.6%, and CognitiveDrone-R1 attains a success rate of 77.2%. These results demonstrate improvements of up to 30% in critical cognitive tasks, underscoring the effectiveness of incorporating advanced reasoning capabilities into UA V control systems. Our contributions include the development of a state-of-the-art VLA model for UA V control and the introduction of the first dedicated benchmark for assessing cognitive tasks in drone operations.


FLOAT Drone: A Fully-actuated Coaxial Aerial Robot for Close-Proximity Operations

arXiv.org Artificial Intelligence

How to endow aerial robots with the ability to operate in close proximity remains an open problem. The core challenges lie in the propulsion system's dual-task requirement: generating manipulation forces while simultaneously counteracting gravity. These competing demands create dynamic coupling effects during physical interactions. Furthermore, rotor-induced airflow disturbances critically undermine operational reliability. Although fully-actuated unmanned aerial vehicles (UAVs) alleviate dynamic coupling effects via six-degree-of-freedom (6-DoF) force-torque decoupling, existing implementations fail to address the aerodynamic interference between drones and environments. They also suffer from oversized designs, which compromise maneuverability and limit their applications in various operational scenarios. To address these limitations, we present FLOAT Drone (FuLly-actuated cOaxial Aerial roboT), a novel fully-actuated UAV featuring two key structural innovations. By integrating control surfaces into fully-actuated systems for the first time, we significantly suppress lateral airflow disturbances during operations. Furthermore, a coaxial dual-rotor configuration enables a compact size while maintaining high hovering efficiency. Through dynamic modeling, we have developed hierarchical position and attitude controllers that support both fully-actuated and underactuated modes. Experimental validation through comprehensive real-world experiments confirms the system's functional capabilities in close-proximity operations.


ATMO: An Aerially Transforming Morphobot for Dynamic Ground-Aerial Transition

arXiv.org Artificial Intelligence

Designing ground-aerial robots is challenging due to the increased actuation requirements which can lead to added weight and reduced locomotion efficiency. Morphobots mitigate this by combining actuators into multi-functional groups and leveraging ground transformation to achieve different locomotion modes. However, transforming on the ground requires dealing with the complexity of ground-vehicle interactions during morphing, limiting applicability on rough terrain. Mid-air transformation offers a solution to this issue but demands operating near or beyond actuator limits while managing complex aerodynamic forces. We address this problem by introducing the Aerially Transforming Morphobot (ATMO), a robot which transforms near the ground achieving smooth transition between aerial and ground modes. To achieve this, we leverage the near ground aerodynamics, uncovered by experimental load cell testing, and stabilize the system using a model-predictive controller that adapts to ground proximity and body shape. The system is validated through numerous experimental demonstrations. We find that ATMO can land smoothly at body postures past its actuator saturation limits by virtue of the uncovered ground-effect.