AITopics

2509.1735

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceSep-26-2025

Model-Based Reinforcement Learning under Random Observation Delays

Karamzade, Armin, Kim, Kyungmin, Lanier, JB, Corsi, Davide, Fox, Roy

Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations. We then introduce a simple delay-aware framework that incorporates this idea into model-based RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach to delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to delay distribution shifts during deployment. Additionally, we present experiments on simulated robotic tasks, comparing our method to common practical heuristics and emphasizing the importance of explicitly modeling observation delays.

machine learning, reinforcement learning, world model, (15 more...)

2509.20869

Country: North America (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

arXiv.org Artificial IntelligenceSep-26-2025

RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking

Sun, Zhenguo, Peng, Yibo, Meng, Yuan, Li, Xukun, Huang, Bo-Sheng, Bing, Zhenshan, Wang, Xinlong, Knoll, Alois

Abstract-- Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality. I. INTRODUCTION Humanoid robots are increasingly expected to execute long-horizon, highly dynamic behaviors such as dance, where small tracking errors compound rapidly and destabilize control. A principal source of such drift is the mismatch between idealized reference trajectories and the robot's true physics (actuation limits, friction, inertia, latency).

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

2509.20717

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Neural Information Processing SystemsSep-25-2025, 23:56:52 GMT

3fa2d2b637122007845a2fbb7c21453b-Paper-Conference.pdf

agent, exploration, intrinsic reward, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre: Overview (0.67)

Industry:

Leisure & Entertainment > Games > Computer Games (0.47)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsSep-25-2025, 23:08:22 GMT

30754e5f4cd69d64b5527cdd87d3cf62-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Energy > Oil & Gas > Upstream (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsSep-25-2025, 09:17:37 GMT

2a4310c4fd24bd336aa2f64f93cb5d39-Paper-Conference.pdf

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas > Upstream (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Siddique, Umer, Sinha, Abhinav, Cao, Yongcan

Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning

Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2509.20338

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Seo, Junwon, Nakamura, Kensuke, Bajcsy, Andrea

Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures

Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space-spanning both the latent representation and the epistemic uncertainty-we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions. Video results can be found on the project website at https://cmu-intentlab.github.io/UNISafe

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2505.00779

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(3 more...)

Simon, Raphael, Libin, Pieter, Mees, Wim

Learning Robust Penetration-Testing Policies under Partial Observability: A systematic evaluation

Penetration testing, the simulation of cyberattacks to identify security vulnerabilities, presents a sequential decision-making problem well-suited for reinforcement learning (RL) automation. Like many applications of RL to real-world problems, partial observability presents a major challenge, as it invalidates the Markov property present in Markov Decision Processes (MDPs). Partially Observable MDPs require history aggregation or belief state estimation to learn successful policies. We investigate stochastic, partially observable penetration testing scenarios over host networks of varying size, aiming to better reflect real-world complexity through more challenging and representative benchmarks. This approach leads to the development of more robust and transferable policies, which are crucial for ensuring reliable performance across diverse and unpredictable real-world environments. Using vanilla Proximal Policy Optimization (PPO) as a baseline, we compare a selection of PPO variants designed to mitigate partial observability, including frame-stacking, augmenting observations with historical information, and employing recurrent or transformer-based architectures. We conduct a systematic empirical analysis of these algorithms across different host network sizes. We find that this task greatly benefits from history aggregation. Converging three times faster than other approaches. Manual inspection of the learned policies by the algorithms reveals clear distinctions and provides insights that go beyond quantitative results.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2509.20008

Country:

North America > United States (0.46)
Europe > Belgium (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

SAGE:State-Aware Guided End-to-End Policy for Multi-Stage Sequential Tasks via Hidden Markov Decision Process

Wu, BinXu, Zhang, TengFei, Yang, Chen, Wen, JiaHao, Li, HaoCheng, Ma, JingTian, Chen, Zhen, Wang, JingYuan

Abstract--Multi-stage sequential (MSS) robotic manipulation tasks are prevalent and crucial in robotics. They often involve state ambiguity, where visually similar observations correspond to different actions. We present SAGE, a state-aware guided imitation learning framework that models tasks as a Hidden Markov Decision Process (HMDP) to explicitly capture latent task stages and resolve ambiguity. We instantiate the HMDP with a state transition network that infers hidden states, and a state-aware action policy that conditions on both observations and hidden states to produce actions, thereby enabling disambiguation across task stages. T o reduce manual annotation effort, we propose a semi-automatic labeling pipeline combining active learning and soft label interpolation. In real-world experiments across multiple complex MSS tasks with state ambiguity, SAGE achieved 100% task success under the standard evaluation protocol, markedly surpassing the baselines. Ablation studies further show that such performance can be maintained with manual labeling for only about 13% of the states, indicating its strong effectiveness. OBOTIC manipulation tasks have attracted significant attention due to their broad applications. Vision-based strategies have been widely adopted [1], and have demonstrated remarkable performance across a variety of real-world scenarios [2], [3], [4], [5], [6]. However, a particular class of tasks--Multi-Stage Sequential (MSS) tasks--introduces distinctive challenges to vision-based policies. MSS tasks are characterized by a sequence of interdependent stages that must be executed in a prescribed temporal order, often requiring the policy to perform long-horizon reasoning, retain contextual information from prior steps, and ensure coherent progression across successive stages. In such cases, visually similar observations may correspond to different actions, resulting in ambiguity during action selection. An illustrative case is the Push Buttons task shown in Figure 1. The visual observations at stages 1-1, 2-1, and 3-1 are nearly indistinguishable; however, the correct action--pressing the yellow, pink, or blue button--requires knowledge of the current task stage to be correctly determined.

artificial intelligence, machine learning, proceedings, (14 more...)

2509.19853

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)