Supplementary Material for Lipschitz-Certifiable Training with a Tight Outer Bound

Neural Information Processing Systems 

L (ζ,y) L (z ( x),y), (S1) where L is the cross-entropy loss function. First, it is overestimated when propagating through a linear layer. Second, the outer bound is overestimated because of ReLU layers. In our algorithm, we apply the power iteration to compute the layer-wise Lipschitz constants efficiently. On the other hand, we run the power iteration until convergence for inference.