Goto

Collaborating Authors

 Overview


Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions

arXiv.org Artificial Intelligence

Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide-ranging examination of the anti-UAV domain, centering on three core objectives-classification, detection, and tracking-while detailing emerging methodologies such as diffusion-based data synthesis, multi-modal fusion, vision-language modeling, self-supervised learning, and reinforcement learning. We systematically evaluate state-of-the-art solutions across both single-modality and multi-sensor pipelines (spanning RGB, infrared, audio, radar, and RF) and discuss large-scale as well as adversarially oriented benchmarks. Our analysis reveals persistent gaps in real-time performance, stealth detection, and swarm-based scenarios, underscoring pressing needs for robust, adaptive anti-UAV systems. By highlighting open research directions, we aim to foster innovation and guide the development of next-generation defense strategies in an era marked by the extensive use of UAVs.


Intelligent Systems and Robotics: Revolutionizing Engineering Industries

arXiv.org Artificial Intelligence

-- A mix of intelligent systems and robotics is making engineering industries much more efficient, precise and able to adapt. How artificial intelligence (AI), machine learning (ML) and autonomous robotic technologies are changing manufacturing, civil, electrical and mechanical engineering is discussed in this paper. Based on recent findings and a sugges ted way to evaluate intelligent robotic systems in industry, we give an overview of how their use impacts productivity, safety an d operational costs. Experience and case studies confirm the benefits this area brings and the problems that have yet to be sol ved. The findings indicate that intelligent robotics involves more than a technology change; it introduces important new methods in engineering . I. INTRODUCTION Because of rapid advancements in technology, engineering industries have changed a lot.


A Survey on Improving Human Robot Collaboration through Vision-and-Language Navigation

arXiv.org Artificial Intelligence

Vision-and-Language Navigation (VLN) is a multi-modal, cooperative task requiring agents to interpret human instructions, navigate 3D environments, and communicate effectively under ambiguity. This paper presents a comprehensive review of recent VLN advancements in robotics and outlines promising directions to improve multi-robot coordination. Despite progress, current models struggle with bidirectional communication, ambiguity resolution, and collaborative decision-making in the multi-agent systems. We review approximately 200 relevant articles to provide an in-depth understanding of the current landscape. Through this survey, we aim to provide a thorough resource that inspires further research at the intersection of VLN and robotics. We advocate that the future VLN systems should support proactive clarification, real-time feedback, and contextual reasoning through advanced natural language understanding (NLU) techniques. Additionally, decentralized decision-making frameworks with dynamic role assignment are essential for scalable, efficient multi-robot collaboration. These innovations can significantly enhance human-robot interaction (HRI) and enable real-world deployment in domains such as healthcare, logistics, and disaster response.


Foundation Models for Trajectory Planning in Autonomous Driving: A Review of Progress and Open Challenges

arXiv.org Artificial Intelligence

The emergence of multi-modal foundation models has markedly transformed the technology for autonomous driving, shifting away from conventional and mostly hand-crafted design choices towards unified, foundation-model-based approaches, capable of directly inferring motion trajectories from raw sensory inputs. This new class of methods can also incorporate natural language as an additional modality, with Vision-Language-Action (VLA) models serving as a representative example. In this review, we provide a comprehensive examination of such methods through a unifying taxonomy to critically evaluate their architectural design choices, methodological strengths, and their inherent capabilities and limitations. Our survey covers 37 recently proposed approaches that span the landscape of trajectory planning with foundation models. Furthermore, we assess these approaches with respect to the openness of their source code and datasets, offering valuable information to practitioners and researchers. We provide an accompanying webpage that catalogs the methods based on our taxonomy, available at: https://github.com/fiveai/FMs-for-driving-trajectories


A Comprehensive Survey on Surgical Digital Twin

arXiv.org Artificial Intelligence

Such models are integral to the development of context-aware surgical training systems and process monitoring platforms [11], [19] as well as for encoding adaptive robotic control policies in teleoperated environments [13], [20], [78]. However, their limited capacity to capture continuous biophysical dynamics can constrain their utility in applications where physiological fidelity is essential. Recognizing the limitations inherent in purely continuous or discrete approaches, hybrid modeling strategies have emerged as a state-of-the-art solution for surgical digital twins. These frameworks integrate continuous dynamic models with discrete state machines, enabling the simultaneous tracking of physiological changes and procedural events [8], [7], [19], [37]. For example, hybrid automata have been deployed to synchronize real-time updates of tissue deformation with the sequencing of surgical tool actions [7], [19]. This integration allows digital twins to provide context-sensitive support, adapting to abrupt workflow transitions and physiological perturbations alike--a critical requirement in both routine and emergent surgical scenarios [8], [11], [7]. B. Mutual Information and Information-Theoretic Approaches With the proliferation of multi-modal surgical data, information-theoretic concepts have become indispensable for quantifying uncertainty, relevance, and redundancy across heterogeneous information streams. Mutual information I(X; Y) has been adopted as a rigorous metric for selecting the most informative sensors, imaging modalities, or clinical parameters, thereby enhancing the efficiency and robustness of digital twin-enabled decision support [2], [3], [13], [34], [11], [51], [48], [26], [29]. This is formally captured as Eq.


SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot

arXiv.org Artificial Intelligence

Safe and trustworthy Human Robot Interaction (HRI) requires robots not only to complete tasks but also to regulate impedance and speed according to scene context and human proximity. We present SafeHumanoid, an egocentric vision pipeline that links Vision Language Models (VLMs) with Retrieval-Augmented Generation (RAG) to schedule impedance and velocity parameters for a humanoid robot. Egocentric frames are processed by a structured VLM prompt, embedded and matched against a curated database of validated scenarios, and mapped to joint-level impedance commands via inverse kinematics. We evaluate the system on tabletop manipulation tasks with and without human presence, including wiping, object handovers, and liquid pouring. The results show that the pipeline adapts stiffness, damping, and speed profiles in a context-aware manner, maintaining task success while improving safety. Although current inference latency (up to 1.4 s) limits responsiveness in highly dynamic settings, SafeHumanoid demonstrates that semantic grounding of impedance control is a viable path toward safer, standard-compliant humanoid collaboration.


Standard Occupation Classifier -- A Natural Language Processing Approach

arXiv.org Artificial Intelligence

Standard Occupational Classifiers (SOC) are systems used to categorize and classify different types of jobs and occupations based on their similarities in terms of job duties, skills, and qualifications. Integrating these facets with Big Data from job advertisement offers the prospect to investigate labour demand that is specific to various occupations. This project investigates the use of recent developments in natural language processing to construct a classifier capable of assigning an occupation code to a given job advertisement. We develop various classifiers for both UK ONS SOC and US O*NET SOC, using different Language Models. We find that an ensemble model, which combines Google BERT and a Neural Network classifier while considering job title, description, and skills, achieved the highest prediction accuracy. Specifically, the ensemble model exhibited a classification accuracy of up to 61% for the lower (or fourth) tier of SOC, and 72% for the third tier of SOC. This model could provide up to date, accurate information on the evolution of the labour market using job advertisements.


Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition

arXiv.org Artificial Intelligence

Graph autoencoders have gained attention in nonlinear reduced-order modeling of parameterized partial differential equations defined on unstructured grids. Despite they provide a geometrically consistent way of treating complex domains, applying such architectures to parame-terized dynamical systems for temporal prediction beyond the training data, i.e. the extrapolation regime, is still a challenging task due to the simultaneous need of temporal causality and general-izability in the parametric space. In this work, we explore the integration of graph convolutional autoencoders (GCAs) with tensor train (TT) decomposition and Operator Inference (OpInf) to develop a time-consistent reduced-order model. In particular, high-fidelity snapshots are represented as a combination of parametric, spatial, and temporal cores via TT decomposition, while OpInf is used to learn the evolution of the latter. Moreover, we enhance the generalization performance by developing a multi-fidelity two-stages approach in the framework of Deep Operator Networks (DeepONet), treating the spatial and temporal cores as the trunk networks, and the parametric core as the branch network. Numerical results, including heat-conduction, advection-diffusion and vortex-shedding phenomena, demonstrate great performance in effectively learning the dynamic in the extrapolation regime for complex geometries, also in comparison with state-of-the-art approaches e.g.


A Game-Theoretic Approach for Adversarial Information Fusion in Distributed Sensor Networks

arXiv.org Artificial Intelligence

Every day we share our personal information through digital systems which are constantly exposed to threats. For this reason, security-oriented disciplines of signal processing have received increasing attention in the last decades: multimedia forensics, digital watermarking, biometrics, network monitoring, steganography and steganalysis are just a few examples. Even though each of these fields has its own peculiarities, they all have to deal with a common problem: the presence of one or more adversaries aiming at making the system fail. Adversarial Signal Processing lays the basis of a general theory that takes into account the impact that the presence of an adversary has on the design of effective signal processing tools. By focusing on the application side of Adversarial Signal Processing, namely adversarial information fusion in distributed sensor networks, and adopting a game-theoretic approach, this thesis contributes to the above mission by addressing four issues. First, we address decision fusion in distributed sensor networks by developing a novel soft isolation defense scheme that protect the network from adversaries, specifically, Byzantines. Second, we develop an optimum decision fusion strategy in the presence of Byzantines. In the next step, we propose a technique to reduce the complexity of the optimum fusion by relying on a novel near-optimum message passing algorithm based on factor graphs. Finally, we introduce a defense mechanism to protect decentralized networks running consensus algorithm against data falsification attacks.


ARM-Explainer -- Explaining and improving graph neural network predictions for the maximum clique problem using node features and association rule mining

arXiv.org Artificial Intelligence

Numerous graph neural network (GNN)-based algorithms have been proposed to solve graph-based combinatorial optimization problems (COPs), but methods to explain their predictions remain largely undeveloped. We introduce ARM-Explainer, a post-hoc, model-level explainer based on association rule mining, and demonstrate it on the predictions of the hybrid geometric scattering (HGS) GNN for the maximum clique problem (MCP), a canonical NP-hard graph-based COP. The eight most explanatory association rules discovered by ARM-Explainer achieve high median lift and confidence values of 2.42 and 0.49, respectively, on test instances from the TWITTER and BHOSLIB-DIMACS benchmark datasets. ARM-Explainer identifies the most important node features, together with their value ranges, that influence the GNN's predictions on these datasets. Furthermore, augmenting the GNN with informative node features substantially improves its performance on the MCP, increasing the median largest-found clique size by 22% (from 29.5 to 36) on large graphs from the BHOSLIB-DIMACS dataset.