Identifiability in Inverse Reinforcement Learning: Supplementary Material A Appendix: Proofs of Results Proof of Theorem 1. Fix

Neural Information Processing Systems 

Combining these inequalities, along with the fact γ < 1, we conclude that g (s) 0 for all s S . Hence, as γ < 1, we conclude that g (s) 0 for all s S . Given we know both agents' policies ( As R is closed under addition, we see that c = λa + µb R . Therefore, all states can be accessed from s. If the starting state is ephemeral, it is clear that we can add a constant to its rewards independently of all other states' rewards, as this will not affect decision Proof of Theorem 4. We first prove the sufficiency statement.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found