Goto

Collaborating Authors

 wide-resnet




A Definition of a batch normalization layer When applying batch normalization to convolutional layers, the inputs and outputs of normalization layers are 4-dimensional tensors, which we denote by I

Neural Information Processing Systems

For distributed training, the batch statistics are usually estimated locally on a subset of the training minibatch ("ghost batch normalization" [ We now define the three models in full. These inputs first pass through a single fully connected linear layer of width 1000. We then apply a series of residual blocks. LeCun normal initialization [48] to preserve the variance in the absence of non-linearities. We then apply a series of residual blocks.



DNN or $k$-NN: That is the Generalize vs. Memorize Question

Cohen, Gilad, Sapiro, Guillermo, Giryes, Raja

arXiv.org Machine Learning

This paper studies the relationship between the classification performed by deep neural networks and the $k$-NN decision at the embedding space of these networks. This simple important connection shown here provides a better understanding of the relationship between the ability of neural networks to generalize and their tendency to memorize the training data, which are traditionally considered to be contradicting to each other and here shown to be compatible and complementary. Our results support the conjecture that deep neural networks approach Bayes optimal error rates.