WiVi Y
–Neural Information Processing Systems
Notice that this is a contradiction because any point withWL+1 = 0 is in the setS. Hence, there exists no point at which the Hessian is negative semidefinite. This can be easily seen by replacing the convexloss with the squared loss inthe proof for Theorem 1and applying (18). Weconclude that the Hessian must beindefinite atevery saddle point under the assumptions; in other words, the Hessian has at least one strictly negative eigenvalue. B.2 Modelsandarchitectures EveryResNEst wasastandard ResNet without thebatch normalization andRectified Linear Unit (ReLU) at the final residual representation, i.e., their architectures are exactly the same before the final residual representation.
Neural Information Processing Systems
Feb-18-2026, 22:58:53 GMT