Towards Instance-Optimality in Online PAC Reinforcement Learning