Provably Efficient Exploration in Reward Machines with Low Regret

Open in new window