Blending Imitation and Reinforcement Learning for Robust Policy Improvement

Liu, Xuefeng, Yoneda, Takuma, Stevens, Rick L., Walter, Matthew R., Chen, Yuxin

Oct-4-2023–arXiv.org Machine Learning

While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. RPI draws on the strengths of IL, using oracle queries to facilitate exploration--an aspect that is notably challenging in sparse-reward RL-- particularly during the early stages of learning. As learning unfolds, RPI gradually transitions to RL, effectively treating the learned policy as an improved oracle. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles. Integral to RPI are Robust Active Policy Selection (RAPS) and Robust Policy Gradient (RPG), both of which reason over whether to perform state-wise imitation from the oracles or learn from its own value function when the learner's performance surpasses that of the oracles in a specific state. Reinforcement learning (RL) has shown significant advancements, surpassing human capabilities in diverse domains such as Go (Silver et al., 2017), video games (Berner et al., 2019; Mnih et al., 2013), and Poker (Zhao et al., 2022). Despite such achievements, the application of RL is largely constrained by its substantial computational and data requirements and high sample complexity, particularly in fields like robotics (Singh et al., 2022) and healthcare (Han et al., 2023), where the extensive online interaction for trial and error is often impractical. Imitation learning (IL) (Osa et al., 2018) improves sample efficiency by allowing the agent to replace some or all environment interactions with demonstrations provided by an oracle policy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

Oct-4-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Energy (0.68)
- Government > Regional Government (0.46)
- Leisure & Entertainment > Games (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found