Goto

Collaborating Authors

 own latent


Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Neural Information Processing Systems

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods intrinsically rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches 74.3% top-1 classification accuracy on ImageNet using the standard linear evaluation protocol with a standard ResNet-50 architecture and 79.6% with a larger ResNet. We also show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks.


Learning More with Less: Self-Supervised Approaches for Low-Resource Speech Emotion Recognition

Gong, Ziwei, Shi, Pengyuan, Donbekci, Kaan, Ai, Lin, Chen, Run, Sasu, David, Wu, Zehui, Hirschberg, Julia

arXiv.org Artificial Intelligence

Speech Emotion Recognition (SER) has seen significant progress with deep learning, yet remains challenging for Low-Resource Languages (LRLs) due to the scarcity of annotated data. In this work, we explore unsupervised learning to improve SER in low-resource settings. Specifically, we investigate contrastive learning (CL) and Bootstrap Y our Own Latent (BYOL) as self-supervised approaches to enhance cross-lingual generalization. Our methods achieve notable F1 score improvements of 10.6% in Urdu, 15.2% in German, and 13.9% in Bangla, demonstrating their effectiveness in LRLs. Additionally, we analyze model behavior to provide insights on key factors influencing performance across languages, and also highlighting challenges in low-resource SER. This work provides a foundation for developing more inclusive, explainable, and robust emotion recognition systems for underrepresented languages.


Review for NeurIPS paper: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Neural Information Processing Systems

Weaknesses: -As mentioned in the paper, the proposed method has a trivial solution, that both models output 0's. To me, the method is too simple to be true. I tried to reimplement it, but no success. It is highly recommend to opensource the code for reproduceable research. How can you learn detection with frozen representation? Please use the standard settings, e.g. as in MoCo.


Review for NeurIPS paper: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Neural Information Processing Systems

This paper proposes a new method for self-supervised learning, which doesn't require negative pairs, unlike other contrastive approaches. It instead makes use of a target network. The reviewers unanimously voted to accept -- they really liked this paper and found it to be quite novel.


Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Neural Information Processing Systems

We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods intrinsically rely on negative pairs, BYOL achieves a new state of the art without them.


BYOL -- Bootstrap Your Own Latent

#artificialintelligence

It is a method of machine learning where the model learns from the supervisory signal of the data unlike supervised learning where separate labels are specified for each observation. It is also known as Representation Learning. Note, the model's learned representation is used for downstream tasks like BERT, where language models are used for downstream tasks like text classification. Here, we can use Linear classifiers along with a learned self-supervised model for prediction. Recently, self-supervised learning has seen a great surge in the number of papers getting published for few obvious reasons including the availability of unlabelled data.