A formal proof of an

Neural Information Processing Systems 

First, we would like to thank the reviewers for their helpful feedback. It is difficult/impossible to generate a worst case environment model in many scenarios. MDP, as opposed to the entirety of the agent's dynamics. Writing such worst-case constraints by hand is often feasible. How efficient is our algorithm?