PAC Reinforcement Learning without Real-World Feedback
Zhong, Yuren, Deshmukh, Aniket Anand, Scott, Clayton
This work studies reinforcement learning in the Sim-to-Real setting, in which an agent is first trained on a number of simulators before being deployed in the real world, with the aim of decreasing the real-world sample complexity requirement. Using a dynamic model known as a rich observation Markov decision process (ROMDP), we formulate a theoretical framework for Sim-to-Real in the situation where feedback in the real world is not available. We establish real-world sample complexity guarantees that are smaller than what is currently known for directly (i.e., without access to simulators) learning a ROMDP with feedback.
Sep-24-2019
- Country:
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Genre:
- Research Report (0.64)
- Technology: