Goto

Collaborating Authors

 system








Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks

Neural Information Processing Systems

This paper considers a class of reinforcement learning problems, which involve systems with two types of states: stochastic and pseudo-stochastic. In such systems, stochastic states follow a stochastic transition kernel while the transitions of pseudo-stochastic states are deterministic {\em given} the stochastic states/transitions. We refer to such systems as mixed systems, which are widely used in various applications, including Manufacturing systems, communication networks, and queueing networks. We propose a sample-efficient RL method that accelerates learning by generating augmented data samples. The proposed algorithm is data-driven (model-free), but it learns the policy from data samples from both real and augmented samples. This method significantly improves learning by reducing the sample complexity such that the dataset only needs to have sufficient coverage of the stochastic states. We analyze the sample complexity of the proposed method under Fitted Q Iteration (FQI) and demonstrate that the optimality gap decreases as $O\left(\sqrt{\frac{1}{n}}+\sqrt{\frac{1}{m}}\right),$ where $n$ represents the number of real samples, and $m$ is the number of augmented samples per real sample. It is important to note that without augmented samples, the optimality gap is $O(1)$ due to the insufficient data coverage of the pseudo-stochastic states. Our experimental results on multiple queueing network applications confirm that the proposed method indeed significantly accelerates both deep Q-learning and deep policy gradient.


The Role of Integrity Monitoring in Connected and Automated Vehicles: Current State-of-Practice and Future Directions

Nayak, Saswat Priyadarshi, Barth, Matthew

arXiv.org Artificial Intelligence

Connected and Automated Vehicle (CAV) research has gained traction in the last decade due to significant advancements in perception, navigation, communication, and control functions. Accurate and reliable position information is needed to meet the requirements of CAV applications, especially when safety is concerned. With the advent of various perception sensors (e.g. camera, LiDAR, etc.), the vehicular positioning system has improved both in accuracy and robustness. Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based cooperative positioning can improve the accuracy of the position estimates, but the integrity risks involved in multi-sensor fusion in a cooperative environment have not yet been fully explored. This paper reviews existing research in the field of positioning Integrity Monitoring (IM) and identifies various research gaps. Particular attention has been placed on identifying research that highlights cooperative IM methods. This analysis helps pave the way for the development of new IM frameworks for cooperative positioning solutions in the future.


DexterityGen: Foundation Controller for Unprecedented Dexterity

Yin, Zhao-Heng, Wang, Changhao, Pineda, Luis, Hogan, Francois, Bodduluri, Krishna, Sharma, Akash, Lancaster, Patrick, Prasad, Ishita, Kalakrishnan, Mrinal, Malik, Jitendra, Lambeta, Mike, Wu, Tingfan, Abbeel, Pieter, Mukadam, Mustafa

arXiv.org Artificial Intelligence

Teaching robots dexterous manipulation skills, such as tool use, presents a significant challenge. Current approaches can be broadly categorized into two strategies: human teleoperation (for imitation learning) and sim-to-real reinforcement learning. The first approach is difficult as it is hard for humans to produce safe and dexterous motions on a different embodiment without touch feedback. The second RL-based approach struggles with the domain gap and involves highly task-specific reward engineering on complex tasks. Our key insight is that RL is effective at learning low-level motion primitives, while humans excel at providing coarse motion commands for complex, long-horizon tasks. Therefore, the optimal solution might be a combination of both approaches. In this paper, we introduce DexterityGen (DexGen), which uses RL to pretrain large-scale dexterous motion primitives, such as in-hand rotation or translation. We then leverage this learned dataset to train a dexterous foundational controller. In the real world, we use human teleoperation as a prompt to the controller to produce highly dexterous behavior. We evaluate the effectiveness of DexGen in both simulation and real world, demonstrating that it is a general-purpose controller that can realize input dexterous manipulation commands and significantly improves stability by 10-100x measured as duration of holding objects across diverse tasks. Notably, with DexGen we demonstrate unprecedented dexterous skills including diverse object reorientation and dexterous tool use such as pen, syringe, and screwdriver for the first time.


Synthesis of Model Predictive Control and Reinforcement Learning: Survey and Classification

Reiter, Rudolf, Hoffmann, Jasper, Reinhardt, Dirk, Messerer, Florian, Baumgärtner, Katrin, Sawant, Shamburaj, Boedecker, Joschka, Diehl, Moritz, Gros, Sebastien

arXiv.org Artificial Intelligence

The fields of MPC and RL consider two successful control techniques for Markov decision processes. Both approaches are derived from similar fundamental principles, and both are widely used in practical applications, including robotics, process control, energy systems, and autonomous driving. Despite their similarities, MPC and RL follow distinct paradigms that emerged from diverse communities and different requirements. Various technical discrepancies, particularly the role of an environment model as part of the algorithm, lead to methodologies with nearly complementary advantages. Due to their orthogonal benefits, research interest in combination methods has recently increased significantly, leading to a large and growing set of complex ideas leveraging MPC and RL. This work illuminates the differences, similarities, and fundamentals that allow for different combination algorithms and categorizes existing work accordingly. Particularly, we focus on the versatile actor-critic RL approach as a basis for our categorization and examine how the online optimization approach of MPC can be used to improve the overall closed-loop performance of a policy.