Identifiability in Inverse Reinforcement Learning: Supplementary Material A Appendix: Proofs of Results Proof of Theorem 1. Fix
–Neural Information Processing Systems
Combining these inequalities, along with the fact γ < 1, we conclude that g (s) 0 for all s S . Hence, as γ < 1, we conclude that g (s) 0 for all s S . Given we know both agents' policies ( As R is closed under addition, we see that c = λa + µb R . Therefore, all states can be accessed from s. If the starting state is ephemeral, it is clear that we can add a constant to its rewards independently of all other states' rewards, as this will not affect decision Proof of Theorem 4. We first prove the sufficiency statement.
Neural Information Processing Systems
Aug-14-2025, 22:16:48 GMT
- Technology: