Bounds all around training energy based models with bidirectional bounds Supplementary Material

Neural Information Processing Systems 

A.1 Proof of Theorem 1 Proof log null E The first inequality is derived by Holder's inequality, so Existence is ensured as long as the chosen activation functions have at least one derivative almost everywhere. Smooth activations naturally satisfy this assumption, but it is worth noting that e.g. the ReLU activation We cannot guarantee that the Jacobian has full rank through clever choices of neural architectures. This is a natural requirement for the generator anyway. In our model, we aim to maximize the entropy of the generator, which encourages the generator to create as diverse samples as possible. In practice this ensures that the Jacobian has full rank as a degenerate Jacobian implies a reduction of entropy.