Non-local Policy Optimization via Diversity-regularized Collaborative Exploration