AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

OmniHang: Learning to Hang Arbitrary Objects using Contact Point Correspondences and Neural Collision Estimation

You, Yifan, Shao, Lin, Migimatsu, Toki, Bohg, Jeannette

arXiv.org Artificial IntelligenceMar-26-2021

In this paper, we explore whether a robot can learn to hang arbitrary objects onto a diverse set of supporting items such as racks or hooks. Endowing robots with such an ability has applications in many domains such as domestic services, logistics, or manufacturing. Yet, it is a challenging manipulation task due to the large diversity of geometry and topology of everyday objects. In this paper, we propose a system that takes partial point clouds of an object and a supporting item as input and learns to decide where and how to hang the object stably. Our system learns to estimate the contact point correspondences between the object and supporting item to get an estimated stable pose. We then run a deep reinforcement learning algorithm to refine the predicted stable pose. Then, the robot needs to find a collision-free path to move the object from its initial pose to stable hanging pose. To this end, we train a neural network based collision estimator that takes as input partial point clouds of the object and supporting item. We generate a new and challenging, large-scale, synthetic dataset annotated with stable poses of objects hung on various supporting items and their contact point correspondences. In this dataset, we show that our system is able to achieve a 68.3% success rate of predicting stable object poses and has a 52.1% F1 score in terms of finding feasible paths. Supplemental material and videos are available on our project webpage.

contact point, goal pose, point cloud, (14 more...)

arXiv.org Artificial Intelligence

2103.14283

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Nearly Horizon-Free Offline Reinforcement Learning

Ren, Tongzheng, Li, Jialian, Dai, Bo, Du, Simon S., Sanghavi, Sujay

arXiv.org Machine LearningMar-25-2021

We revisit offline reinforcement learning on episodic time-homogeneous tabular Markov Decision Processes with $S$ states, $A$ actions and planning horizon $H$. Given the collected $N$ episodes data with minimum cumulative reaching probability $d_m$, we obtain the first set of nearly $H$-free sample complexity bounds for evaluation and planning using the empirical MDPs: 1.For the offline evaluation, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Nd_m}} \right)$ error rate, which matches the lower bound and does not have additional dependency on $\poly\left(S,A\right)$ in higher-order term, that is different from previous works~\citep{yin2020near,yin2020asymptotically}. 2.For the offline policy optimization, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Nd_m}} + \frac{S}{Nd_m}\right)$ error rate, improving upon the best known result by \cite{cui2020plug}, which has additional $H$ and $S$ factors in the main term. Furthermore, this bound approaches the $\Omega\left(\sqrt{\frac{1}{Nd_m}}\right)$ lower bound up to logarithmic factors and a high-order term. To the best of our knowledge, these are the first set of nearly horizon-free bounds in offline reinforcement learning.

horizon-free offline reinforcement learning

arXiv.org Machine Learning

2103.14077

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Duan, Yaqi, Jin, Chi, Li, Zhiyuan

arXiv.org Artificial IntelligenceMar-25-2021

This paper considers batch Reinforcement Learning (RL) with general value function approximation. Our study investigates the minimal assumptions to reliably estimate/minimize Bellman error, and characterizes the generalization performance by (local) Rademacher complexities of general function classes, which makes initial steps in bridging the gap between statistical learning theory and batch RL. Concretely, we view the Bellman error as a surrogate loss for the optimality gap, and prove the followings: (1) In double sampling regime, the excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class. (2) In the single sampling regime, sample-efficient risk minimization is not possible without further assumptions, regardless of algorithms. However, with completeness assumptions, the excess risk of FQI and a minimax style algorithm can be again bounded by the Rademacher complexity of the corresponding function classes. (3) Fast statistical rates can be achieved by using tools of local Rademacher complexity. Our analysis covers a wide range of function classes, including finite classes, linear spaces, kernel spaces, sparse linear features, etc.

complexity, rademacher complexity, theorem 5, (16 more...)

arXiv.org Artificial Intelligence

2103.13883

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Workflow (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Hierarchical Program-Triggered Reinforcement Learning Agents For Automated Driving

Gangopadhyay, Briti, Soora, Harshit, Dasgupta, Pallab

arXiv.org Artificial IntelligenceMar-25-2021

Recent advances in Reinforcement Learning (RL) combined with Deep Learning (DL) have demonstrated impressive performance in complex tasks, including autonomous driving. The use of RL agents in autonomous driving leads to a smooth human-like driving experience, but the limited interpretability of Deep Reinforcement Learning (DRL) creates a verification and certification bottleneck. Instead of relying on RL agents to learn complex tasks, we propose HPRL - Hierarchical Program-triggered Reinforcement Learning, which uses a hierarchy consisting of a structured program along with multiple RL agents, each trained to perform a relatively simple task. The focus of verification shifts to the master program under simple guarantees from the RL agents, leading to a significantly more interpretable and verifiable implementation as compared to a complex RL agent. The evaluation of the framework is demonstrated on different driving tasks, and NHTSA precrash scenarios using CARLA, an open-source dynamic urban simulation environment.

agent, ego vehicle, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2103.13861

Country:

Asia > India > West Bengal > Kharagpur (0.05)
North America > United States > New York (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Self-Imitation Learning by Planning

Luo, Sha, Kasaei, Hamidreza, Schomaker, Lambert

arXiv.org Artificial IntelligenceMar-25-2021

Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. In this work, we solve this problem using our proposed approach called {self-imitation learning by planning (SILP)}, where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner, so we can plan and relabel robot's own trials as demonstrations for policy learning. Due to these self-generated demonstrations, we relieve the human operator from the laborious data preparation process required by IL and RL methods in solving complex motion planning tasks. The evaluation results show that our SILP method achieves higher success rates and enhances sample efficiency compared to selected baselines, and the policy learned in simulation performs well in a real-world placement task with changing goals and obstacles.

demonstration, obstacle, reinforcement, (13 more...)

arXiv.org Artificial Intelligence

2103.13834

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Netherlands (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Utilizing reinforcement learning tools for real-time optimization of aging gas turbines

#artificialintelligenceMar-24-2021, 01:51:14 GMT

Reinforcement learning is essentially an AI discipline in the sense that it tries to realize content that normally can only be realized by utilizing human …

gas turbine, real-time optimization, utilizing reinforcement

#artificialintelligence

Industry: Media > News (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Architecture > Real Time Systems (0.85)

Add feedback

Artificial Intelligence for Simple Games

#artificialintelligenceMar-24-2021, 01:05:08 GMT

Artificial Intelligence for Simple Games - Learn how to use powerful Deep Reinforcement Learning and Artificial Intelligence tools on examples of AI simple games! Created by Jan Warchocki, Ligency TeamPreview this Course - GET COUPON CODE Ever wish you could harness the power of Deep Learning and Machine Learning to craft intelligent bots built for gaming? If you're looking for a creative way to dive into Artificial Intelligence, then'Artificial Intelligence for Simple Games' is your key to building lasting knowledge. Learn and test your AI knowledge of fundamental DL and ML algorithms using the fun and flexible environment of simple games such as Snake, the Travelling Salesman problem, mazes and more. Whether you're an absolute beginner or seasoned Machine Learning expert, this course provides a solid foundation of the basic and advanced concepts you need to build AI within a gaming environment and beyond.

artificial intelligence, intelligence, simple game, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Add feedback

Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents

Yao, Shunyu, Narasimhan, Karthik, Hausknecht, Matthew

arXiv.org Artificial IntelligenceMar-24-2021

Text-based games simulate worlds and interact with players using natural language. Recent work has used them as a testbed for autonomous language-understanding agents, with the motivation being that understanding the meanings of words or semantics is a key component of how humans understand, reason, and act in these worlds. However, it remains unclear to what extent artificial agents utilize semantic understanding of the text. To this end, we perform experiments to systematically reduce the amount of semantic information available to a learning agent. Surprisingly, we find that an agent is capable of achieving high scores even in the complete absence of language semantics, indicating that the currently popular experimental setup and models may be poorly designed to understand and leverage game texts. To remedy this deficiency, we propose an inverse dynamics decoder to regularize the representation space and encourage exploration, which shows improved performance on several games including Zork I. We discuss the implications of our findings for designing future agents with stronger semantic understanding.

agent, drrn, representation, (14 more...)

arXiv.org Artificial Intelligence

2103.13552

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication

Xu, Xing, Li, Rongpeng, Zhao, Zhifeng, Zhang, Honggang

arXiv.org Artificial IntelligenceMar-24-2021

The paper considers a distributed version of deep reinforcement learning (DRL) for multi-agent decision-making process in the paradigm of federated learning. Since the deep neural network models in federated learning are trained locally and aggregated iteratively through a central server, frequent information exchange incurs a large amount of communication overheads. Besides, due to the heterogeneity of agents, Markov state transition trajectories from different agents are usually unsynchronized within the same time interval, which will further influence the convergence bound of the aggregated deep neural network models. Therefore, it is of vital importance to reasonably evaluate the effectiveness of different optimization methods. Accordingly, this paper proposes a utility function to consider the balance between reducing communication overheads and improving convergence performance. Meanwhile, this paper develops two new optimization methods on top of variation-aware periodic averaging methods: 1) the decay-based method which gradually decreases the weight of the model's local gradients within the progress of local updating, and 2) the consensus-based method which introduces the consensus algorithm into federated learning for the exchange of the model's local gradients. This paper also provides novel convergence guarantees for both developed methods and demonstrates their effectiveness and efficiency through theoretical analysis and numerical simulation results.

agent, convergence, learning, (16 more...)

arXiv.org Artificial Intelligence

2103.13026

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Quebec > Montreal (0.04)
(9 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Discriminator Augmented Model-Based Reinforcement Learning

Haghgoo, Behzad, Zhou, Allan, Sharma, Archit, Finn, Chelsea

arXiv.org Artificial IntelligenceMar-24-2021

By planning through a learned dynamics model, model-based reinforcement learning (MBRL) offers the prospect of good performance with little environment interaction. However, it is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance. This paper aims to improve planning with an importance sampling framework that accounts and corrects for discrepancy between the true and learned dynamics. This framework also motivates an alternative objective for fitting the dynamics model: to minimize the variance of value estimation during planning. We derive and implement this objective, which encourages better prediction on trajectories with larger returns. We observe empirically that our approach improves the performance of current MBRL algorithms on two stochastic control problems, and provide a theoretical basis for our method.

discriminator, learning, objective, (14 more...)

arXiv.org Artificial Intelligence

2103.12999

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback