Combining Off and On-Policy Training in Model-Based Reinforcement Learning