The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Dec-24-2025, 21:14:08 GMT–Neural Information Processing Systems

The surprising discovery of the BYOL method shows the negative samples can be replaced by adding the prediction head to the network. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations.

name change, non-contrastive self-supervised learning, prediction head, (8 more...)

Neural Information Processing Systems

Dec-24-2025, 21:14:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)