Identifiability in Inverse Reinforcement Learning: Supplementary Material A Appendix: Proofs of Results

Jan-26-2025, 21:17:23 GMT–Neural Information Processing Systems

Applying Jensen's inequality, we can see that, for s arg min Combining these inequalities, along with the fact γ < 1, we conclude that g(s) 0 for all s S. Again applying Jensen's inequality to (9), for s arg max Hence, as γ < 1, we conclude that g(s) 0 for all s S. Combining these results, we conclude that g 0, that is, V Proof of Theorem 2. From Theorem 1, if we can determine the value function for one of our agents, then the reward is uniquely identified. Given we know both agents' policies (π, π) and our agents are optimizing their respective MDPs, for every a A, s S, we know the value of λ log π(a|s) π(a|s) = γ T (s Therefore, the space of solutions to (10) is either empty (in which case no consistent reward exists), or determines v up to the addition of a constant. Given v is determined up to a constant we can use Theorem 1 to determine f, again up to the addition of a constant. Let R N be a set of natural numbers, with the property that R is closed under addition (if a, b R then a + b R). Suppose R has greatest common divisor 1 (i.e.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Jan-26-2025, 21:17:23 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Duplicate Docs Excel Report

Title
Identifiability in Inverse Reinforcement Learning: Supplementary Material

Similar Docs Excel Report more

Title	Similarity	Source
None found