Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control