Supplementary material: Inverse Reinforcement Learning in a ContinuousStateSpacewithFormalGuarantees AProofsoflemmasandtheorems

Neural Information Processing Systems 

We note that the interchange of the integral and infinite summation is justified by Section 3.7 in [5], since the coefficients Z Now,define action sequence (a)n such thata1 = a and an = a1 for alln > 1. Then we can use subadditivity of measure to bound the maximum difference across all entries of [kZ]. Therefore, the induced infinity norm error ofbZ isless thanεifthe element wise error isless than ε/k. Therefore,bα>Fφ(s) is ρ-Lipschitz if the absolute value of its derivativeisboundedbyρ,i.e. SincebF has all zeros beyond thek-th column and row, each infinite-matrix bF can be treated as ak k matrix.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found