model-based deep reinforcement learning
Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
Sample efficiency has been one of the major challenges for deep reinforcement learning. Recently, model-based reinforcement learning has been proposed to address this challenge by performing planning on imaginary trajectories with a learned world model. However, world model learning may suffer from overfitting to training trajectories, and thus model-based value estimation and policy search will be prone to be sucked in an inferior local policy. In this paper, we propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD). It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories. We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
Review for NeurIPS paper: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
Weaknesses: While I find this paper reasonably thorough, I'm skeptical of the novelty. It seems the two components that differentiate it from Dreamer come from this mutual information maximization objective, which is to maximize the policy entropy and minimize the model loss. While there is an ablation showing what happens if you remove the model loss component, there is no ablation showing what happens if you remove the entropy maximization. My assumption is that the core reason for improvement is the model loss, which is not a surprising result. Doing this ablation would address this concern.
Review for NeurIPS paper: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
The paper introduces the BIRD algorithm, a model-based RL algorithm based on differentiable planning (SVG-like). A key aspect of BIRD is a Mutual Information term in the loss function, which encourages the similarity of the imaginary data and the real observations. Reviewers generally liked this paper, even though there have been some concerns related to the extent of its novelty, especially compared to Dreamer. I summarize some of the concerns here, which should be addressed in the revised version of this work. Please refer to the reviews for more detail, and revise your paper by incorporating their comments.
Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
Sample efficiency has been one of the major challenges for deep reinforcement learning. Recently, model-based reinforcement learning has been proposed to address this challenge by performing planning on imaginary trajectories with a learned world model. However, world model learning may suffer from overfitting to training trajectories, and thus model-based value estimation and policy search will be prone to be sucked in an inferior local policy. In this paper, we propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD). It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories.
TDM: From Model-Free to Model-Based Deep Reinforcement Learning
You've decided that you want to bike from your house by UC Berkeley to the Golden Gate Bridge. To make matters worse, you are new to the Bay Area, and all you have is a good ol' fashion map to guide you. How do you get started? Let's first figure out how to ride a bike. One strategy would be to do a lot of studying and planning.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.04)