AITopics

2509.13534

Country: Asia > China (0.28)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

arXiv.org Artificial IntelligenceSep-18-2025

Cooperative Target Detection with AUVs: A Dual-Timescale Hierarchical MARDL Approach

Xueyao, Zhang, Bo, Yang, Zhiwen, Yu, Xuelin, Cao, Alexandropoulos, George C., Debbah, Merouane, Yuen, Chau

Autonomous Underwater Vehicles (AUVs) have shown great potential for cooperative detection and reconnaissance. However, collaborative AUV communications introduce risks of exposure. In adversarial environments, achieving efficient collaboration while ensuring covert operations becomes a key challenge for underwater cooperative missions. In this paper, we propose a novel dual time-scale Hierarchical Multi-Agent Proximal Policy Optimization (H-MAPPO) framework. The high-level component determines the individuals participating in the task based on a central AUV, while the low-level component reduces exposure probabilities through power and trajectory control by the participating AUVs. Simulation results show that the proposed framework achieves rapid convergence, outperforms benchmark algorithms in terms of performance, and maximizes long-term cooperative efficiency while ensuring covert operations.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

2509.13381

Country:

Asia > China (0.47)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
(2 more...)

JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks

Zhang, Jiahao, Pei, Xiaobing, Zhong, Zhaokun, Hao, Wenqiang, Tang, Zhenghao

Graph Neural Networks (GNNs) have demonstrated remarkable performance across various applications, yet they are vulnerable to sophisticated adversarial attacks, particularly node injection attacks. The success of such attacks heavily relies on their stealthiness, the ability to blend in with the original graph and evade detection. However, existing methods often achieve stealthiness by relying on indirect proxy metrics, lacking consideration for the fundamental characteristics of the injected content, or focusing only on imitating local structures, which leads to the problem of local myopia. To overcome these limitations, we propose a dual-constraint stealthy node injection framework, called Joint Alignment of Nodal and Universal Structures (JANUS). At the local level, we introduce a local feature manifold alignment strategy to achieve geometric consistency in the feature space. At the global level, we incorporate structured latent variables and maximize the mutual information with the generated structures, ensuring the injected structures are consistent with the semantic patterns of the original graph. We model the injection attack as a sequential decision process, which is optimized by a reinforcement learning agent. Experiments on multiple standard datasets demonstrate that the JANUS framework significantly outperforms existing methods in terms of both attack effectiveness and stealthiness.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

2509.13266

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Collaborative Loco-Manipulation for Pick-and-Place Tasks with Dynamic Reward Curriculum

An, Tianxu, De Vincenti, Flavio, Ma, Yuntao, Hutter, Marco, Coros, Stelian

Abstract--We present a hierarchical reinforcement learning (RL) pipeline for training one-armed legged robots to perform pick-and-place (P&P) tasks end-to-end--from approaching the payload to releasing it at a target area--in both single-robot and cooperative dual-robot settings. We introduce a novel dynamic reward curriculum that enables a single policy to efficiently learn long-horizon P&P operations by progressively guiding the agents through payload-centered sub-objectives. Compared to state-of-the-art approaches for long-horizon RL tasks, our method improves training efficiency by 55% and reduces execution time by 18.6 % in simulation experiments. In the dual-robot case, we show that our policy enables each robot to attend to different components of its observation space at distinct task stages, promoting effective coordination via autonomous attention shifts. T o our knowledge, this is the first RL pipeline that tackles the full scope of collaborative P&P with two legged manipulators. ANY industries rely on human workers to perform physically demanding tasks such as lifting and transporting heavy loads through cluttered environments. Logistics and construction are among the most prominent examples, where workers engage in monotonous, repetitive pick-and-place (P&P) operations with significant risks of physical injury. While advancements in hardware technology have made it possible for robots to traverse complex environments [19, 25] and dynamically manipulate rigid objects [4], the necessary levels of autonomy and coordination for long-horizon mobile manipulation have yet to be effectively demonstrated in real-world scenarios.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2509.13239

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.81)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Zhang, Zhihao, Peng, Chengyang, Zhu, Minghao, Yurtsever, Ekim, Redmill, Keith A.

An Uncertainty-Weighted Decision Transformer for Navigation in Dense, Complex Driving Scenarios

Autonomous driving in dense, dynamic environments requires decision-making systems that can exploit both spatial structure and long-horizon temporal dependencies while remaining robust to uncertainty. This work presents a novel framework that integrates multi-channel bird's-eye-view occupancy grids with transformer-based sequence modeling for tactical driving in complex roundabout scenarios. To address the imbalance between frequent low-risk states and rare safety-critical decisions, we propose the Uncertainty-Weighted Decision Transformer (UWDT). UWDT employs a frozen teacher transformer to estimate per-token predictive entropy, which is then used as a weight in the student model's loss function. This mechanism amplifies learning from uncertain, high-impact states while maintaining stability across common low-risk transitions. Experiments in a roundabout simulator, across varying traffic densities, show that UWDT consistently outperforms other baselines in terms of reward, collision rate, and behavioral stability. The results demonstrate that uncertainty-aware, spatial-temporal transformers can deliver safer and more efficient decision-making for autonomous driving in complex traffic environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2509.13132

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)

Külz, Jonathan, Ha, Sehoon, Althoff, Matthias

A Design Co-Pilot for Task-Tailored Manipulators

Although robotic manipulators are used in an ever-growing range of applications, robot manufacturers typically follow a ``one-fits-all'' philosophy, employing identical manipulators in various settings. This often leads to suboptimal performance, as general-purpose designs fail to exploit particularities of tasks. The development of custom, task-tailored robots is hindered by long, cost-intensive development cycles and the high cost of customized hardware. Recently, various computational design methods have been devised to overcome the bottleneck of human engineering. In addition, a surge of modular robots allows quick and economical adaptation to changing industrial settings. This work proposes an approach to automatically designing and optimizing robot morphologies tailored to a specific environment. To this end, we learn the inverse kinematics for a wide range of different manipulators. A fully differentiable framework realizes gradient-based fine-tuning of designed robots and inverse kinematics solutions. Our generative approach accelerates the generation of specialized designs from hours with optimization-based methods to seconds, serving as a design co-pilot that enables instant adaptation and effective human-AI collaboration. Numerical experiments show that our approach finds robots that can navigate cluttered environments, manipulators that perform well across a specified workspace, and can be adapted to different hardware constraints. Finally, we demonstrate the real-world applicability of our method by setting up a modular robot designed in simulation that successfully moves through an obstacle course.

evolutionary algorithm, machine learning, reinforcement learning, (21 more...)

2509.13077

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making

Hong, Xingxing, Wang, Yungong, Jin, Dexin, Yuan, Ye, Huang, Ximing, Wu, Zijian, Li, Wenxin

Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on mi-cromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, we introduce HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. We also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. We integrate state-of-the-art MARL algorithms and LLM-based agents with our benchmark and conduct comprehensive experiments. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.

machine learning, natural language, reinforcement learning, (19 more...)

2509.12927

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Deichler, Anna, Wang, Siyang, Alexanderson, Simon, Beskow, Jonas

Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation

Abstract--Pointing is an important mode of interaction with robots. While large amounts of prior studies focus on recognition of human pointing, there is a lack of investigation into generating context-aware human-like pointing gestures, a shortcoming we hope to address. We first collect a rich dataset of human pointing gestures and corresponding pointing target locations with accurate motion capture. Analysis of the dataset shows that it contains various pointing styles, handedness, and well-distributed target positions in surrounding 3D space in both single-target pointing scenario and two-target point-and-place. We then train reinforcement learning (RL) control policies in physically realistic simulation to imitate the pointing motion in the dataset while maximizing pointing precision reward. We show that our RL motion imitation setup allows models to learn human-like pointing dynamics while maximizing task reward (pointing precision).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2509.1288

Country: Europe > Sweden (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

GRATE: a Graph transformer-based deep Reinforcement learning Approach for Time-efficient autonomous robot Exploration

Ni, Haozhan, Liang, Jingsong, He, Chenyu, Cao, Yuhong, Sartoretti, Guillaume

Autonomous robot exploration (ARE) is the process of a robot autonomously navigating and mapping an unknown environment. Recent Reinforcement Learning (RL)-based approaches typically formulate ARE as a sequential decision-making problem defined on a collision-free informative graph. However, these methods often demonstrate limited reasoning ability over graph-structured data. Moreover, due to the insufficient consideration of robot motion, the resulting RL policies are generally optimized to minimize travel distance, while neglecting time efficiency. To overcome these limitations, we propose GRATE, a Deep Reinforcement Learning (DRL)-based approach that leverages a Graph Transformer to effectively capture both local structure patterns and global contextual dependencies of the informative graph, thereby enhancing the model's reasoning capability across the entire environment. In addition, we deploy a Kalman filter to smooth the waypoint outputs, ensuring that the resulting path is kinodynamically feasible for the robot to follow. Experimental results demonstrate that our method exhibits better exploration efficiency (up to 21.5% in distance and 21.3% in time to complete exploration) than state-of-the-art conventional and learning-based baselines in various simulation benchmarks. We also validate our planner in real-world scenarios.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2509.12863

Country: Asia (0.68)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Contrastive Representation Learning for Robust Sim-to-Real Transfer of Adaptive Humanoid Locomotion

Lu, Yidan, Yang, Rurui, Kou, Qiran, Chen, Mengting, Fan, Tao, Cui, Peter, Dong, Yinzhao, Lu, Peng

Abstract-- Reinforcement learning has produced remarkable advances in humanoid locomotion, yet a fundamental dilemma persists for real-world deployment: policies must choose between the robustness of reactive proprioceptive control or the proactivity of complex, fragile perception-driven systems. Our core contribution is a contrastive learning framework that compels the actor's latent state to encode privileged environmental information from simulation. Crucially, this "distilled awareness" empowers an adaptive gait clock, allowing the policy to proactively adjust its rhythm based on an inferred understanding of the terrain. This synergy resolves the classic trade-off between rigid, clocked gaits and unstable clock-free policies. I. INTRODUCTION Achieving stable and adaptive locomotion in unstructured environments is a grand challenge for humanoid robotics. While Deep Reinforcement Learning (DRL) has become a cornerstone for synthesizing such behaviors, a fundamental information gap complicates real-world deployment.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2509.12858

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)