ALoss Derivation In this section we provide a more detailed derivation of the proposed loss function (Equation 17)

Neural Information Processing Systems 

In this section we provide a more detailed derivation of the proposed loss function (Equation 17). We make use of the fact that the negative entropy of the Dirichlet distribution is equivalent to the reverse KL-divergence to a flat Dirichlet, up to an additive constant which doesn't depend on the model. Additionally, we can see that by adding +1 to the target concentration parameters ˆ, we are now minimizing an upper bound to the KL-divergence between the mean and the ensemble. Then we divide through by ˆ 0 and drop the additive constant. This yields a loss which is remarkable similar to an ELBO.

Duplicate Docs Excel Report

Similar Docs  Excel Report  more

TitleSimilaritySource
None found