The Role of Coverage in Online Reinforcement Learning
Xie, Tengyang, Foster, Dylan J., Bai, Yu, Jiang, Nan, Kakade, Sham M.
–arXiv.org Artificial Intelligence
The last decade has seen development of reinforcement learning algorithms with strong empirical performance in domains including robotics (Kober et al., 2013; Lillicrap et al., 2015), dialogue systems (Li et al., 2016), and personalization (Agarwal et al., 2016; Tewari and Murphy, 2017). While there is great interest in applying these techniques to real-world decision making applications, the number of samples (steps of interaction) required to do so is often prohibitive, with state-of-the-art algorithms requiring millions of samples to reach human-level performance in challenging domains. Developing algorithms with improved sample efficiency, which entails efficiently generalizing across high-dimensional states and actions while taking advantage of problem structure as modeled practitioners, remains a major challenge. Investigation into design and analysis of algorithms for sample-efficient reinforcement learning has largely focused on two distinct problem formulations: Online reinforcement learning, where the learner can repeatedly interact with the environment by executing a policy and observing the resulting trajectory. Offline reinforcement learning, where the learner has access to logged transitions ands reward gathered from a fixed behavioral policy (e.g., historical data or expert demonstrations), but cannot directly interact with the underlying environment. While these formulations share a common goal (learning a near-optimal policy), the algorithms used to achieve this goal and conditions under which it can be achieved are seemingly quite different.
arXiv.org Artificial Intelligence
Oct-8-2022
- Country:
- North America > United States
- Illinois (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Instructional Material > Online (0.61)
- Technology: