A T able of Notation Notation Description λ, w Hyperparameters and parameters L T, L

Neural Information Processing Systems 

T able 2: A summary of notations used in this paper. In this section, we present the training algorithm for Self-Tuning Networks. Let A and B be square positive definite matrices. C.31, we get: r λ (λ) = null D.4 can be represented as: E The second term in Eqn D.15 is: E Therefore, first and second terms correspond to the first-and second-order Taylor approximations to the loss. In this section, we describe a structured best-response approximation for convolutional layers.