Supplementary material: Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees AProofs of lemmas and theorems
–Neural Information Processing Systems
A.1 Additional lemma Lemma 9 Let s0 be the starting state, let (a)n represent a sequence of actions and let M = Z(ar)Z(ar 1)...Z(a1) i.e., the product of matrices in {Z(a)}left multiplied in order of the sequence Proof Here we use proof by induction. We note that the interchange of the integral and infinite summation is justified by Section 3.7 in [5], since the coefficients Z We can then conclude the statement of the lemma by induction. A.2 Proof of Proposition 1 Proof By Lemma 9, given a fixed sequence of actions (a)n, the r-th state sr under this sequence of actions starting from state s0 has a distribution that can be represented over the basis {φn(s)}. Therefore, the expected reward under any sequence of actions for reward Ris the same as for the projected reward R0 for any state sr where r > 0. The reward at the starting state, R(s0) does not depend on the policy. Therefore, the value of R(s0) does not change whether a policy is optimal or not.
Neural Information Processing Systems
Apr-25-2026, 11:51:13 GMT
- Technology: