1a669e81c8093745261889539694be7f-Supplemental.pdf
–Neural Information Processing Systems
Ifweassumethereward function is a linear combination of features, it is often the case that the number of featuresk is much lessthanthetotalnumber ofstate-action pairs. When learning a posterior from demonstrations we use Bayesian IRL [4]. Bayesian IRL uses Markov chain Monte Carlo (MCMC) sampling to sample from the posterior P(R|D). The step size was tuned to result in an accept ratio close to0.4. Ifso, then we stop gradient ascent.
Neural Information Processing Systems
Feb-7-2026, 16:04:14 GMT
- Country:
- Technology: