Non-crossing quantile regression for deep reinforcement learning