Policy Optimization with Model-based Explorations