Goto

Collaborating Authors

 Reinforcement Learning







Meta-ReinforcementLearningwith Self-ModifyingNetworks

Neural Information Processing Systems

However, these neural systems are slow learners producing specialized agentswithnomechanism tocontinue learning beyondtheirtrainingcurriculum.



Supplementary material: Inverse Reinforcement Learning in a ContinuousStateSpacewithFormalGuarantees AProofsoflemmasandtheorems

Neural Information Processing Systems

We note that the interchange of the integral and infinite summation is justified by Section 3.7 in [5], since the coefficients Z Now,define action sequence (a)n such thata1 = a and an = a1 for alln > 1. Then we can use subadditivity of measure to bound the maximum difference across all entries of [kZ]. Therefore, the induced infinity norm error ofbZ isless thanεifthe element wise error isless than ε/k. Therefore,bα>Fφ(s) is ρ-Lipschitz if the absolute value of its derivativeisboundedbyρ,i.e. SincebF has all zeros beyond thek-th column and row, each infinite-matrix bF can be treated as ak k matrix.


384babc3e7faa44cf1ca671b74499c3b-Paper.pdf

Neural Information Processing Systems

TheIRLsettingisremarkably useful for automated control, in situations where the reward function is difficult to specify manually or as a means to extract agent preference.