Goto

Collaborating Authors

 Smith, Samuel


A study on the plasticity of neural networks

arXiv.org Artificial Intelligence

For example, PackNet (Mallya & Lazebnik, 2017) eventually One aim shared by multiple settings, such as continual gets to a point where all neurons are frozen and learning is learning or transfer learning, is to leverage not possible anymore. In the same fashion, accumulating previously acquired knowledge to converge faster constraints in EWC (Kirkpatrick et al., 2017) might lead on the current task. Usually this is done through to a strongly regularised objective that does not allow for fine-tuning, where an implicit assumption is that the new task's loss to be minimised. Alternatively, learning the network maintains its plasticity, meaning that might become less data efficient, referred to as negative the performance it can reach on any given task is forward transfer, an effect often noticed for regularisation not affected negatively by previously seen tasks.


BYOL works even without batch statistics

arXiv.org Machine Learning

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.