Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation