Reviews: Self-Normalizing Neural Networks
–Neural Information Processing Systems
The paper proposes a new (class of) activation function f to more efficiently train very deep feed-forward neural networks (networks with f are called SNNs). The authors argue that 1) SNNs converge towards normalized activation distributions 2) SGD is more stable as the SNN approx preserves variance from layer to layer. In fact, f is part of a family of activation functions, for which theoretical guarantees for fixed-point convergence exist. These functions are contraction mappings and characterized by mean/variance preservation across layers at the fixed point -- solving these constraints allows finding other "self-normalizing" f, in principle. Whether f converges to a fixed point, is sensitive to the choice of hyper-parameters: the authors demonstrate certain weight initializations and parameters settings that give the fixed-point behavior.
Neural Information Processing Systems
Oct-7-2024, 22:12:03 GMT
- Technology: