Supplementary Material A Derivations and Further Technical Details 15 A.1 Proof of Proposition 1

Neural Information Processing Systems 

Following Haarnoja et al. [13], we can now rewrite Equation (A.4) as [ ( J A.3 Regularized Maximum Likelihood Estimation To address the collapse in predictive variance away from the offline dataset under MLE training seen in Figure 1, Wu et al. [51] in practice augment the usual MLE loss with an entropy bonus as follows: π Whilst entropy regularization partially mitigates the collapse of predictive variance away from the expert demonstrations, we still observe the wrong trend similar to Figure 1 with predictive variances high near the expert demonstrations and low on unseen data. The variance surface also becomes more poorly behaved, with "islands" of high predictive variance appearing away from the data. Figure 12 shows the predictive variances of behavioral policies trained on expert demonstrations for the "door-binary-v0" environment with varying Tikhonov regularization coefficients λ. Similarly, Tikhonov regularization does not resolve the issue with calibration of uncertainties. We also observe that too high a regularization strength causes the model to underfit to the variances of the data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found