ALoss Derivation In this section we provide a more detailed derivation of the proposed loss function (Equation 17)

Apr-25-2026, 08:29:14 GMT–Neural Information Processing Systems

In this section we provide a more detailed derivation of the proposed loss function (Equation 17). We make use of the fact that the negative entropy of the Dirichlet distribution is equivalent to the reverse KL-divergence to a flat Dirichlet, up to an additive constant which doesn't depend on the model. Additionally, we can see that by adding +1 to the target concentration parameters ˆ, we are now minimizing an upper bound to the KL-divergence between the mean and the ensemble. Then we divide through by ˆ 0 and drop the additive constant. This yields a loss which is remarkable similar to an ELBO.

artificial intelligence, ensemble, machine learning, (16 more...)

Neural Information Processing Systems

Apr-25-2026, 08:29:14 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.95)

Duplicate Docs Excel Report

Title
A Loss Derivation

Similar Docs Excel Report more

Title	Similarity	Source
None found