Towards Safe Policy Improvement for Non-Stationary MDPs