Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games