Appendix A Proof of Theorems 15 A.1 Backward Pass: Proofs of Lemma 3.1 and Theorem 3.2 16 A.2 Gradient Analysis of Section 3.3

Neural Information Processing Systems 

We will also use the following well-known properties of the Kronecker product.