Policy Optimization for Robust Average Reward MDPs