Supplementary Material A Derivations and Further Technical Details 15 A.1 Proof of Proposition 1

May-23-2025, 12:42:12 GMT–Neural Information Processing Systems

Following Haarnoja et al. [13], we can now rewrite Equation (A.4) as [ ( J A.3 Regularized Maximum Likelihood Estimation To address the collapse in predictive variance away from the offline dataset under MLE training seen in Figure 1, Wu et al. [51] in practice augment the usual MLE loss with an entropy bonus as follows: π Whilst entropy regularization partially mitigates the collapse of predictive variance away from the expert demonstrations, we still observe the wrong trend similar to Figure 1 with predictive variances high near the expert demonstrations and low on unseen data. The variance surface also becomes more poorly behaved, with "islands" of high predictive variance appearing away from the data. Figure 12 shows the predictive variances of behavioral policies trained on expert demonstrations for the "door-binary-v0" environment with varying Tikhonov regularization coefficients λ. Similarly, Tikhonov regularization does not resolve the issue with calibration of uncertainties. We also observe that too high a regularization strength causes the model to underfit to the variances of the data.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

May-23-2025, 12:42:12 GMT

Conferences PDF

Add feedback

Industry:
- Education (0.31)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.57)
    - Neural Networks (0.49)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.57)

Duplicate Docs Excel Report

Title
Supplementary Material A Derivations and Further Technical Details 15 A.1 Proof of Proposition 1 15 A.2 Laplace Parametric Behavioral Reference Policy

Similar Docs Excel Report more

Title	Similarity	Source
None found