Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration