Optimizing Interpretable Decision Tree Policies for Reinforcement Learning