Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses