The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
–Neural Information Processing Systems
The surprising discovery of the BYOL method shows the negative samples can be replaced by adding the prediction head to the network. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations.
Neural Information Processing Systems
Dec-24-2025, 21:14:08 GMT
- Technology: