Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition