Goto

Collaborating Authors

 Reinforcement Learning


OmniHang: Learning to Hang Arbitrary Objects using Contact Point Correspondences and Neural Collision Estimation

arXiv.org Artificial Intelligence

In this paper, we explore whether a robot can learn to hang arbitrary objects onto a diverse set of supporting items such as racks or hooks. Endowing robots with such an ability has applications in many domains such as domestic services, logistics, or manufacturing. Yet, it is a challenging manipulation task due to the large diversity of geometry and topology of everyday objects. In this paper, we propose a system that takes partial point clouds of an object and a supporting item as input and learns to decide where and how to hang the object stably. Our system learns to estimate the contact point correspondences between the object and supporting item to get an estimated stable pose. We then run a deep reinforcement learning algorithm to refine the predicted stable pose. Then, the robot needs to find a collision-free path to move the object from its initial pose to stable hanging pose. To this end, we train a neural network based collision estimator that takes as input partial point clouds of the object and supporting item. We generate a new and challenging, large-scale, synthetic dataset annotated with stable poses of objects hung on various supporting items and their contact point correspondences. In this dataset, we show that our system is able to achieve a 68.3% success rate of predicting stable object poses and has a 52.1% F1 score in terms of finding feasible paths. Supplemental material and videos are available on our project webpage.


Nearly Horizon-Free Offline Reinforcement Learning

arXiv.org Machine Learning

We revisit offline reinforcement learning on episodic time-homogeneous tabular Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$. Given the collected $N$ episodes data with minimum cumulative reaching probability $d_m$, we obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs: 1.For the offline evaluation, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Nd_m}} \right)$ error rate, which matches the lower bound and does not have additional dependency on $\poly\left(S,A\right)$ in higher-order term, that is different from previous works~\citep{yin2020near,yin2020asymptotically}. 2.For the offline policy optimization, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Nd_m}} + \frac{S}{Nd_m}\right)$ error rate, improving upon the best known result by \cite{cui2020plug}, which has additional $H$ and $S$ factors in the main term. Furthermore, this bound approaches the $\Omega\left(\sqrt{\frac{1}{Nd_m}}\right)$ lower bound up to logarithmic factors and a high-order term. To the best of our knowledge, these are the first set of nearly horizon-free bounds in offline reinforcement learning.


Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

arXiv.org Artificial Intelligence

This paper considers batch Reinforcement Learning (RL) with general value function approximation. Our study investigates the minimal assumptions to reliably estimate/minimize Bellman error, and characterizes the generalization performance by (local) Rademacher complexities of general function classes, which makes initial steps in bridging the gap between statistical learning theory and batch RL. Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class. (2) In the single sampling regime, sample-efficient risk minimization is not possible without further assumptions, regardless of algorithms. However, with completeness assumptions, the excess risk of FQI and a minimax style algorithm can be again bounded by the Rademacher complexity of the corresponding function classes. (3) Fast statistical rates can be achieved by using tools of local Rademacher complexity. Our analysis covers a wide range of function classes, including finite classes, linear spaces, kernel spaces, sparse linear features, etc.


Hierarchical Program-Triggered Reinforcement Learning Agents For Automated Driving

arXiv.org Artificial Intelligence

Recent advances in Reinforcement Learning (RL) combined with Deep Learning (DL) have demonstrated impressive performance in complex tasks, including autonomous driving. The use of RL agents in autonomous driving leads to a smooth human-like driving experience, but the limited interpretability of Deep Reinforcement Learning (DRL) creates a verification and certification bottleneck. Instead of relying on RL agents to learn complex tasks, we propose HPRL - Hierarchical Program-triggered Reinforcement Learning, which uses a hierarchy consisting of a structured program along with multiple RL agents, each trained to perform a relatively simple task. The focus of verification shifts to the master program under simple guarantees from the RL agents, leading to a significantly more interpretable and verifiable implementation as compared to a complex RL agent. The evaluation of the framework is demonstrated on different driving tasks, and NHTSA precrash scenarios using CARLA, an open-source dynamic urban simulation environment.


Self-Imitation Learning by Planning

arXiv.org Artificial Intelligence

Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. In this work, we solve this problem using our proposed approach called {self-imitation learning by planning (SILP)}, where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner, so we can plan and relabel robot's own trials as demonstrations for policy learning. Due to these self-generated demonstrations, we relieve the human operator from the laborious data preparation process required by IL and RL methods in solving complex motion planning tasks. The evaluation results show that our SILP method achieves higher success rates and enhances sample efficiency compared to selected baselines, and the policy learned in simulation performs well in a real-world placement task with changing goals and obstacles.


Utilizing reinforcement learning tools for real-time optimization of aging gas turbines

#artificialintelligence

Reinforcement learning is essentially an AI discipline in the sense that it tries to realize content that normally can only be realized by utilizing human …


Artificial Intelligence for Simple Games

#artificialintelligence

Artificial Intelligence for Simple Games - Learn how to use powerful Deep Reinforcement Learning and Artificial Intelligence tools on examples of AI simple games! Created by Jan Warchocki, Ligency TeamPreview this Course - GET COUPON CODE Ever wish you could harness the power of Deep Learning and Machine Learning to craft intelligent bots built for gaming? If you're looking for a creative way to dive into Artificial Intelligence, then'Artificial Intelligence for Simple Games' is your key to building lasting knowledge. Learn and test your AI knowledge of fundamental DL and ML algorithms using the fun and flexible environment of simple games such as Snake, the Travelling Salesman problem, mazes and more. Whether you're an absolute beginner or seasoned Machine Learning expert, this course provides a solid foundation of the basic and advanced concepts you need to build AI within a gaming environment and beyond.


Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents

arXiv.org Artificial Intelligence

Text-based games simulate worlds and interact with players using natural language. Recent work has used them as a testbed for autonomous language-understanding agents, with the motivation being that understanding the meanings of words or semantics is a key component of how humans understand, reason, and act in these worlds. However, it remains unclear to what extent artificial agents utilize semantic understanding of the text. To this end, we perform experiments to systematically reduce the amount of semantic information available to a learning agent. Surprisingly, we find that an agent is capable of achieving high scores even in the complete absence of language semantics, indicating that the currently popular experimental setup and models may be poorly designed to understand and leverage game texts. To remedy this deficiency, we propose an inverse dynamics decoder to regularize the representation space and encourage exploration, which shows improved performance on several games including Zork I. We discuss the implications of our findings for designing future agents with stronger semantic understanding.


The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication

arXiv.org Artificial Intelligence

The paper considers a distributed version of deep reinforcement learning (DRL) for multi-agent decision-making process in the paradigm of federated learning. Since the deep neural network models in federated learning are trained locally and aggregated iteratively through a central server, frequent information exchange incurs a large amount of communication overheads. Besides, due to the heterogeneity of agents, Markov state transition trajectories from different agents are usually unsynchronized within the same time interval, which will further influence the convergence bound of the aggregated deep neural network models. Therefore, it is of vital importance to reasonably evaluate the effectiveness of different optimization methods. Accordingly, this paper proposes a utility function to consider the balance between reducing communication overheads and improving convergence performance. Meanwhile, this paper develops two new optimization methods on top of variation-aware periodic averaging methods: 1) the decay-based method which gradually decreases the weight of the model's local gradients within the progress of local updating, and 2) the consensus-based method which introduces the consensus algorithm into federated learning for the exchange of the model's local gradients. This paper also provides novel convergence guarantees for both developed methods and demonstrates their effectiveness and efficiency through theoretical analysis and numerical simulation results.


Discriminator Augmented Model-Based Reinforcement Learning

arXiv.org Artificial Intelligence

By planning through a learned dynamics model, model-based reinforcement learning (MBRL) offers the prospect of good performance with little environment interaction. However, it is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance. This paper aims to improve planning with an importance sampling framework that accounts and corrects for discrepancy between the true and learned dynamics. This framework also motivates an alternative objective for fitting the dynamics model: to minimize the variance of value estimation during planning. We derive and implement this objective, which encourages better prediction on trajectories with larger returns. We observe empirically that our approach improves the performance of current MBRL algorithms on two stochastic control problems, and provide a theoretical basis for our method.