Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning