Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy