NumericalinfluenceofReLU'(0)onbackpropagation SupplementaryMaterial
–Neural Information Processing Systems
It can be inferred from Definition 1 that all elements in the definition of a ReLU network training problem are piecewise smooth, where each piece is an elementary log exp function. We refer the reader to [30] for an introduction to piecewise smoothness and recent use of such notions in the context of algorithmic differentiation in [8]. Let us first argue that the results of [8] apply to Definition1. This is Theorem 2 for s [0,T], note that a similar probabilistic argument was developped in [6]. Consider any fully connected ReLU network architecture of depth H, with the softmax function appliedonthelastlayer.
Neural Information Processing Systems
Feb-7-2026, 07:28:07 GMT