Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

Open in new window