Supplement to ' Autoencoders that don't overfit towards the Identity '
–Neural Information Processing Systems
This supplement provides in Section 2, the proof of the Theorem in the paper, in Section 3, the derivation of the ADMM equations for optimizing Eq. 10 in the paper, and in Section 4, the derivation of the update-equations for optimizing Eq. 11 in the paper, and in Section 5, the generalization of Section 3 in the paper to dropout at different layers in a deep network. This first section of the proof provides an overview, where we start with the objective function of Eq. 1 in the paper (re-stated in Eq. 2 below), and show that it is equal to the objective function in the Theorem in the paper (see Eq. 8 below) up to the factor ap + bq, which is an irrelevant constant when optimizing for B In the following, we provide the detailed steps. We first provide the sequence of manipulations at once, and then describe each step in the text below. We start by re-stating Eq. 1 in the paper (X Line 5 states the analytic simplifications obtained for the parts (a) and (b), respectively, when the number n of training-epochs approaches infinity (for convergence). The details are outlined in Sections 2.2 and 2.3 below.
Neural Information Processing Systems
Mar-21-2025, 09:28:14 GMT