Markov Models
Learning Generalizable Policy for Obstacle-Aware Autonomous Drone Racing
Autonomous drone racing has gained attention for its potential to push the boundaries of drone navigation technologies. While much of the existing research focuses on racing in obstacle-free environments, few studies have addressed the complexities of obstacle-aware racing, and approaches presented in these studies often suffer from overfitting, with learned policies generalizing poorly to new environments. This work addresses the challenge of developing a generalizable obstacle-aware drone racing policy using deep reinforcement learning. We propose applying domain randomization on racing tracks and obstacle configurations before every rollout, combined with parallel experience collection in randomized environments to achieve the goal. The proposed randomization strategy is shown to be effective through simulated experiments where drones reach speeds of up to 70 km/h, racing in unseen cluttered environments. This study serves as a stepping stone toward learning robust policies for obstacle-aware drone racing and general-purpose drone navigation in cluttered environments. Code is available at https://github.com/ErcBunny/IsaacGymEnvs.
ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy
Tie, Chenrui, Chen, Yue, Wu, Ruihai, Dong, Boxuan, Li, Zeyi, Gao, Chongkai, Dong, Hao
Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. Further, previous equivariant diffusion models require the per-step equivariance in the Markov process, making it difficult to learn policy under such strong constraints. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner. We evaluate ET-SEED on representative robotic manipulation tasks, involving rigid body, articulated and deformable object. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations with only a few demonstrations. Website: https://et-seed.github.io/
Bayesian Inference in Recurrent Explicit Duration Switching Linear Dynamical Systems
Słupiński, Mikołaj, Lipiński, Piotr
In this paper, we propose a novel model called Recurrent Explicit Duration Switching Linear Dynamical Systems (REDSLDS) that incorporates recurrent explicit duration variables into the rSLDS model. We also propose an inference and learning scheme that involves the use of P\'olya-gamma augmentation. We demonstrate the improved segmentation capabilities of our model on three benchmark datasets, including two quantitative datasets and one qualitative dataset.
The Recurrent Sticky Hierarchical Dirichlet Process Hidden Markov Model
Słupiński, Mikołaj, Lipiński, Piotr
The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) is a natural Bayesian nonparametric extension of the classical Hidden Markov Model for learning from (spatio-)temporal data. A sticky HDP-HMM has been proposed to strengthen the self-persistence probability in the HDP-HMM. Then, disentangled sticky HDP-HMM has been proposed to disentangle the strength of the self-persistence prior and transition prior. However, the sticky HDP-HMM assumes that the self-persistence probability is stationary, limiting its expressiveness. Here, we build on previous work on sticky HDP-HMM and disentangled sticky HDP-HMM, developing a more general model: the recurrent sticky HDP-HMM (RS-HDP-HMM). We develop a novel Gibbs sampling strategy for efficient inference in this model. We show that RS-HDP-HMM outperforms disentangled sticky HDP-HMM, sticky HDP-HMM, and HDP-HMM in both synthetic and real data segmentation.
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data
Qu, Chengrui, Shi, Laixi, Panaganti, Kishan, You, Pengcheng, Wierman, Adam
Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our experimental results demonstrate that HySRL surpasses state-of-the-art online RL baseline.
Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis
Zbeeb, Mohammad, Ghorayeb, Mohammad, Salman, Mariam
Artificial Intelligence (AI) research often aims to develop models that can generalize reliably across complex datasets, yet this remains challenging in fields where data is scarce, intricate, or inaccessible. This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize one of the most demanding structured datasets: Malicious Network Traffic. Our approach uniquely transforms numerical data into text, re-framing data generation as a language modeling task, which not only enhances data regularization but also significantly improves generalization and the quality of the synthetic data. Extensive statistical analyses demonstrate that our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data. Additionally, we conduct a comprehensive study on synthetic data applications, effectiveness, and evaluation strategies, offering valuable insights into its role across various domains. Our code and pre-trained models are openly accessible at Github, enabling further exploration and application of our methodology. Index Terms: Data synthesis, machine learning, traffic generation, privacy preserving data, generative models.
Can Robotic Cues Manipulate Human Decisions? Exploring Consensus Building via Bias-Controlled Non-linear Opinion Dynamics and Robotic Eye Gaze Mediated Interaction in Human-Robot Teaming
Kumar, Rajul, Bhatti, Adam, Yao, Ningshi
Although robots are becoming more advanced with human-like anthropomorphic features and decision-making abilities to improve collaboration, the active integration of humans into this process remains under-explored. This article presents the first experimental study exploring decision-making interactions between humans and robots with visual cues from robotic eyes, which can dynamically influence human opinion formation. The cues generated by robotic eyes gradually guide human decisions towards alignment with the robot's choices. Both human and robot decision-making processes are modeled as non-linear opinion dynamics with evolving biases. To examine these opinion dynamics under varying biases, we conduct numerical parametric and equilibrium continuation analyses using tuned parameters designed explicitly for the presented human-robot interaction experiment. Furthermore, to facilitate the transition from disagreement to agreement, we introduced a human opinion observation algorithm integrated with the formation of the robot's opinion, where the robot's behavior is controlled based on its formed opinion. The algorithms developed aim to enhance human involvement in consensus building, fostering effective collaboration between humans and robots. Experiments with 51 participants (N = 51) show that human-robot teamwork can be improved by guiding human decisions using robotic cues. Finally, we provide detailed insights on the effects of trust, cognitive load, and participant demographics on decision-making based on user feedback and post-experiment interviews.
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Luo, Hao, Zhou, Bohan, Lu, Zongqing
Pre-training for Reinforcement Learning (RL) with purely video data is a valuable yet challenging problem. Although in-the-wild videos are readily available and inhere a vast amount of prior world knowledge, the absence of action annotations and the common domain gap with downstream tasks hinder utilizing videos for RL pre-training. To address the challenge of pre-training with videos, we propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning. By adopting video prediction as a pre-training task, we use a Transformer-based Conditional Variational Autoencoder (CVAE) to learn visual dynamics representations. The pre-trained visual dynamics representations capture the visual dynamics prior knowledge in the videos. This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation. We conduct experiments on a series of robotics visual control tasks and verify that PVDR is an effective form for pre-training with videos to promote policy learning.
Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation
Giral, Francisco, Gómez, Ignacio, Vinuesa, Ricardo, Le-Clainche, Soledad
Abstract-- This study presents a transformer-based approach for fault-tolerant control in fixed-wing Unmanned Aerial Vehicles (UAVs), designed to adapt in real time to dynamic changes caused by structural damage or actuator failures. Employing a teacher-student knowledge distillation framework, the proposed approach trains a student agent with partial observations by transferring knowledge from a privileged expert agent with full observability, enabling robust performance across diverse failure scenarios. In recent years, Unmanned Aerial Vehicles (UAVs) have been widely used to perform various applications in complex However, complex environments and demanding tasks can and critical scenarios, such as search and rescue or cause structural damage to the UAV, altering its aerodynamic autonomous medical transportation. Fixed-wing UAVs, in particular, and reliability of these aerial robots have become major exhibit highly complex, nonlinear dynamics, which can concerns due to the potential implications of system failures. Unlike other robotics fields, such as manipulation and Although current FCSs are robust, they struggle to maintain humanoid locomotion, where advanced control methods are performance when the vehicle dynamics deviate from the essential for managing complex joint movements, UAV original design specifications, sometimes leading to control Flight Control Systems (FCSs) in industry typically rely divergence and catastrophic failure.
Embedding Safety into RL: A New Take on Trust Region Methods
Milosevic, Nikola, Müller, Johannes, Scherf, Nico
Reinforcement Learning (RL) agents are able to solve a wide variety of tasks but are prone to producing unsafe behaviors. Constrained Markov Decision Processes (CMDPs) provide a popular framework for incorporating safety constraints. However, common solution methods often compromise reward maximization by being overly conservative or allow unsafe behavior during training. We propose Constrained Trust Region Policy Optimization (C-TRPO), a novel approach that modifies the geometry of the policy space based on the safety constraints and yields trust regions composed exclusively of safe policies, ensuring constraint satisfaction throughout training. We theoretically study the convergence and update properties of C-TRPO and highlight connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Finally, we demonstrate experimentally that C-TRPO significantly reduces constraint violations while achieving competitive reward maximization compared to state-of-theart CMDP algorithms. Reinforcement Learning (RL) has emerged as a highly successful paradigm in machine learning for solving sequential decision and control problems, with policy gradient (PG) algorithms as a popular approach (Williams, 1992; Sutton et al., 1999; Konda & Tsitsiklis, 1999).