Goto

Collaborating Authors

 experience collection time


WALL-E: An Efficient Reinforcement Learning Research Framework

arXiv.org Machine Learning

Overall, reinforcement learning (RL) involves an agent interacting with an environment through repeatedly running a policy π, collecting experience from each iteration and using that experience to update its policy for maximal reward (Fig 1). Figure 1: RL flow chart Thanks to advancements in big data, computing power, and other machine learning discipline, reinforcement learning has emerged as the pinnacle field in pushing humanity closer to true artificial intelligence.Model-based reinforcement learning, for example, aims to build an accurate model (such as a MDP) of the environment dynamics and train the agent on said model, giving model learning capabilities as well as ease of reward learning. On the other hand, in model-free reinforcement learning, the agent does not have explicit information regarding state transitions and must continuously explore and generate experience to find the optimal policy. In recent years, major problems have arisen in the field of reinforcement learning, such as planning and how to balance exploration and exploitation. Of particular interest, however, is the problem of knowledge gathering, namely how to efficiently and quickly sample trajectories to gain experience and update the policy without adversely affecting average return.