Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes