http://papers.nips.cc/paper_files/paper/2021/file/043ab21fc5a1607b381ac3896176dac6-Paper.pdf

Apr-24-2026, 11:09:12 GMT–Neural Information Processing Systems

In theory, the choice of ReLU0(0) in [0,1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU0(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU0(0) = 0 seems to be the most efficient. For our experiments on ImageNet the gain in test accuracy over ReLU0(0) = 1 was more than 10 points (two runs). We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU0(0)'s value. Overall, the message we convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.

artificial intelligence, machine learning, relu, (18 more...)

Neural Information Processing Systems

Apr-24-2026, 11:09:12 GMT

Conferences PDF

Add feedback

Country:
- Europe > France (0.16)

Industry:
- Education (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Duplicate Docs Excel Report

Title
043ab21fc5a1607b381ac3896176dac6-Paper.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found