Goto

Collaborating Authors

 Drones


3dSAGER: Geospatial Entity Resolution over 3D Objects (Technical Report)

arXiv.org Artificial Intelligence

Urban environments are continuously mapped and modeled by various data collection platforms, including satellites, unmanned aerial vehicles and street cameras. The growing availability of 3D geospatial data from multiple modalities has introduced new opportunities and challenges for integrating spatial knowledge at scale, particularly in high-impact domains such as urban planning and rapid disaster management. Geospatial entity resolution is the task of identifying matching spatial objects across different datasets, often collected independently under varying conditions. Existing approaches typically rely on spatial proximity, textual metadata, or external identifiers to determine correspondence. While useful, these signals are often unavailable, unreliable, or misaligned, especially in cross-source scenarios. To address these limitations, we shift the focus to the intrinsic geometry of 3D spatial objects and present 3dSAGER (3D Spatial-Aware Geospatial Entity Resolution), an end-to-end pipeline for geospatial entity resolution over 3D objects. 3dSAGER introduces a novel, spatial-reference-independent featurization mechanism that captures intricate geometric characteristics of matching pairs, enabling robust comparison even across datasets with incompatible coordinate systems where traditional spatial methods fail. As a key component of 3dSAGER, we also propose a new lightweight and interpretable blocking method, BKAFI, that leverages a trained model to efficiently generate high-recall candidate sets. We validate 3dSAGER through extensive experiments on real-world urban datasets, demonstrating significant gains in both accuracy and efficiency over strong baselines. Our empirical study further dissects the contributions of each component, providing insights into their impact and the overall design choices.


UAV-Assisted Resilience in 6G and Beyond Network Energy Saving: A Multi-Agent DRL Approach

arXiv.org Artificial Intelligence

This paper investigates the unmanned aerial vehicle (UAV)-assisted resilience perspective in the 6G network energy saving (NES) scenario. More specifically, we consider multiple ground base stations (GBSs) and each GBS has three different sectors/cells in the terrestrial networks, and multiple cells are turned off due to NES or incidents, e.g., disasters, hardware failures, or outages. To address this, we propose a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework to enable UAV-assisted communication by jointly optimizing UAV trajectories, transmission power, and user-UAV association under a sleeping ground base station (GBS) strategy. This framework aims to ensure the resilience of active users in the network and the long-term operability of UAVs. Specifically, it maximizes service coverage for users during power outages or NES zones, while minimizing the energy consumption of UAVs. Simulation results demonstrate that the proposed MADDPG policy consistently achieves high coverage ratio across different testing episodes, outperforming other baselines. Moreover, the MADDPG framework attains the lowest total energy consumption, with a reduction of approximately 24\% compared to the conventional all GBS ON configuration, while maintaining a comparable user service rate. These results confirm the effectiveness of the proposed approach in achieving a superior trade-off between energy efficiency and service performance, supporting the development of sustainable and resilient UAV-assisted cellular networks.


Sekai: A Video Dataset towards World Exploration

arXiv.org Artificial Intelligence

Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning "world" in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view (FPV and UVA) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories. Comprehensive analyses and experiments demonstrate the dataset's scale, diversity, annotation quality, and effectiveness for training video generation models. We believe Sekai will benefit the area of video generation and world exploration, and motivate valuable applications. The project page is https://lixsp11.github.io/sekai-project/.


Israeli drone strike kills two in Gaza as ceasefire violations mount

Al Jazeera

Are we closer to a Gaza international peace force? How Israel is using'no war, no peace' model in Gaza How is Israel using PR firms to frame its war? At least two people including a child have been killed in an Israeli drone strike east of Khan Younis in southern Gaza, according to Al Jazeera reporters in the besieged Palestinian territory. Hamas condemned Israel's "daily and continuous violations" since a truce came into effect last month, accusing it of maintaining a campaign of bombardments and demolitions across the besieged enclave. The Israeli military said the Palestinians killed on Monday posed "an immediate threat" to its forces. Israeli forces have also been systematically destroying homes inside the so-called "yellow line", a temporary withdrawal boundary agreed in the ceasefire.


Millions endure power cuts in Ukraine as Russia strikes more energy sites

Al Jazeera

Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Most regions of Ukraine are undergoing scheduled power outages amid a new wave of attacks on energy sites by Russian drones and missiles. Ukrenergo, the state-run electricity transmission systems operator in Ukraine, said the blackouts will last at least until the end of Monday as repairs are conducted on infrastructure damaged over the weekend and demand remains high as the onset of winter approaches. According to Ukraine's military, Russian forces used two air-launched ballistic missiles, five surface-to-air guided missiles and 67 drones, including those of Iranian design, during their attacks overnight into Monday. The Ukrainian army did not report shooting down any of the missiles, but it said 52 of the drones were intercepted and the remaining 15 conducted strikes on nine locations.


Is the fall of Pokrovsk, Ukraine's key eastern stronghold, inevitable?

Al Jazeera

Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Is the fall of Pokrovsk, Ukraine's key eastern stronghold, inevitable? Pokrovsk, a key fortress and logistical hub for Ukrainian forces in the eastern region of Donbas, has been under siege for almost two years. But in recent weeks, tens of thousands of Russian soldiers have been storming the town around the clock, taking over the streets where buildings are mostly reduced to bombed-out, deserted ruins. They use reconnaissance drones and satellite images to identify gaps in Ukrainian defences and use tiny groups of soldiers who are attacked and killed in droves by Ukrainian drones .


Search-TTA: A Multimodal Test-Time Adaptation Framework for Visual Search in the Wild

arXiv.org Artificial Intelligence

To perform outdoor visual navigation and search, a robot may leverage satellite imagery to generate visual priors. This can help inform high-level search strategies, even when such images lack sufficient resolution for target recognition. However, many existing informative path planning or search-based approaches either assume no prior information, or use priors without accounting for how they were obtained. Recent work instead utilizes large Vision Language Models (VLMs) for generalizable priors, but their outputs can be inaccurate due to hallucination, leading to inefficient search. To address these challenges, we introduce Search-TTA, a multimodal test-time adaptation framework with a flexible plug-and-play interface compatible with various input modalities (e.g., image, text, sound) and planning methods (e.g., RL-based). First, we pretrain a satellite image encoder to align with CLIP's visual encoder to output probability distributions of target presence used for visual search. Second, our TTA framework dynamically refines CLIP's predictions during search using uncertainty-weighted gradient updates inspired by Spatial Poisson Point Processes. To train and evaluate Search-TTA, we curate AVS-Bench, a visual search dataset based on internet-scale ecological data containing 380k images and taxonomy data. We find that Search-TTA improves planner performance by up to 30.0%, particularly in cases with poor initial CLIP predictions due to domain mismatch and limited training data. It also performs comparably with significantly larger VLMs, and achieves zero-shot generalization via emergent alignment to unseen modalities. Finally, we deploy Search-TTA on a real UAV via hardware-in-the-loop testing, by simulating its operation within a large-scale simulation that provides onboard sensing.


Bioinspired Soft Quadrotors Jointly Unlock Agility, Squeezability, and Collision Resilience

arXiv.org Artificial Intelligence

Natural flyers use soft wings to seamlessly enable a wide range of flight behaviours, including agile manoeuvres, squeezing through narrow passageways, and withstanding collisions. In contrast, conventional quadrotor designs rely on rigid frames that support agile flight but inherently limit collision resilience and squeezability, thereby constraining flight capabilities in cluttered environments. Inspired by the anisotropic stiffness and distributed mass-energy structures observed in biological organisms, we introduce FlexiQuad, a soft-frame quadrotor design approach that limits this trade-off. We demonstrate a 405-gram FlexiQuad prototype, three orders of magnitude more compliant than conventional quadrotors, yet capable of acrobatic manoeuvres with peak speeds above 80 km/h and linear and angular accelerations exceeding 3 g and 300 rad/s$^2$, respectively. Analysis demonstrates it can replicate accelerations of rigid counterparts up to a thrust-to-weight ratio of 8. Simultaneously, FlexiQuad exhibits fourfold higher collision resilience, surviving frontal impacts at 5 m/s without damage and reducing destabilising forces in glancing collisions by a factor of 39. Its frame can fully compress, enabling flight through gaps as narrow as 70% of its nominal width. Our analysis identifies an optimal structural softness range, from 0.006 to 0.77 N/mm, comparable to that of natural flyers' wings, whereby agility, squeezability, and collision resilience are jointly achieved for FlexiQuad models from 20 to 3000 grams. FlexiQuad expands hovering drone capabilities in complex environments, enabling robust physical interactions without compromising flight performance.


An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones

arXiv.org Artificial Intelligence

The emergence of truck-drone collaborative systems in last-mile logistics has positioned the Traveling Salesman Problem with Drones (TSP-D) as a pivotal extension of classical routing optimization, where synchronized vehicle coordination promises substantial operational efficiency and reduced environmental impact, yet introduces NP-hard combinatorial complexity beyond the reach of conventional optimization paradigms. Deep reinforcement learning offers a theoretically grounded framework to address TSP-D's inherent challenges through self-supervised policy learning and adaptive decision-making. This study proposes a hierarchical Actor-Critic deep reinforcement learning framework for solving the TSP-D problem. The architecture consists of two primary components: a Transformer-inspired encoder and an efficient Minimal Gated Unit decoder. The encoder incorporates a novel, optimized k-nearest neighbors sparse attention mechanism specifically for focusing on relevant spatial relationships, further enhanced by the integration of global node features. The Minimal Gated Unit decoder processes these encoded representations to efficiently generate solution sequences. The entire framework operates within an asynchronous advantage actor-critic paradigm. Experimental results show that, on benchmark TSP-D instances of various scales (N=10 to 100), the proposed model can obtain competitive or even superior solutions in shorter average computation times compared to high-performance heuristic algorithms and existing reinforcement learning methods. Moreover, compared to advanced reinforcement learning algorithm benchmarks, the proposed framework significantly reduces the total training time required while achieving superior final performance, highlighting its notable advantage in training efficiency.


Ukraine drone strikes throw power supplies into disarray in Russian cities

Al Jazeera

Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Ukraine has hit back at Russia's attempts to disable its energy infrastructure with air strikes that succeeded in disrupting power and heating in two cities across the border. Alexander Gusev, regional governor of Voronezh, said several drones were electronically jammed over the city - home to more than one million people - and sparked a fire at a local utility facility that was quickly extinguished. A Russian Defence Ministry statement made no mention of either the Voronezh or Belgorod areas, reporting 44 Ukrainian drones were destroyed or intercepted by Russian forces during the night.