Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms