Instance-Dependent Confidence and Early Stopping for Reinforcement Learning