1a669e81c8093745261889539694be7f-Supplemental.pdf

Neural Information Processing Systems 

Ifweassumethereward function is a linear combination of features, it is often the case that the number of featuresk is much lessthanthetotalnumber ofstate-action pairs. When learning a posterior from demonstrations we use Bayesian IRL [4]. Bayesian IRL uses Markov chain Monte Carlo (MCMC) sampling to sample from the posterior P(R|D). The step size was tuned to result in an accept ratio close to0.4. Ifso, then we stop gradient ascent.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found