Inductive Learning
Embarrassingly Simple MixUp for Time-series
Aggarwal, Karan, Srivastava, Jaideep
Labeling time series data is an expensive task because of domain expertise and dynamic nature of the data. Hence, we often have to deal with limited labeled data settings. Data augmentation techniques have been successfully deployed in domains like computer vision to exploit the use of existing labeled data. We adapt one of the most commonly used technique called MixUp, in the time series domain. Our proposed, MixUp++ and LatentMixUp++, use simple modifications to perform interpolation in raw time series and classification model's latent space, respectively. We also extend these methods with semi-supervised learning to exploit unlabeled data. We observe significant improvements of 1\% - 15\% on time series classification on two public datasets, for both low labeled data as well as high labeled data regimes, with LatentMixUp++.
Weighted Ensemble Self-Supervised Learning
Ruan, Yangjun, Singh, Saurabh, Morningstar, Warren, Alemi, Alexander A., Ioffe, Sergey, Fischer, Ian, Dillon, Joshua V.
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning. These successes have encouraged increasingly advanced SSL techniques (e.g., Grill et al., 2020; Zbontar et al., 2021; He et al., 2022). Perhaps surprisingly however, a simple and otherwise common idea has received limited consideration: ensembling. Ensembling combines predictions from multiple trained models and has proven effective at improving model accuracy (Hansen & Salamon, 1990; Perrone & Cooper, 1992) and capturing predictive uncertainty in supervised learning (Lakshminarayanan et al., 2017; Ovadia et al., 2019). Ensembling in the SSL regime is nuanced, however; since the goal is to learn useful representations from unlabeled data, it is less obvious where and how to ensemble. We explore these questions in this work.
Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos
Lu, Shiyang, Deng, Yunfu, Boularias, Abdeslam, Bekris, Kostas
This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot. A key feature of the self-supervised training process is a graph-matching algorithm that operates on the over-segmentation output of the point cloud that is reconstructed from each video. The graph matching, along with point cloud registration, is able to find reoccurring object patterns across videos and combine them into 3D object pseudo labels, even under occlusions or different viewing angles. Projected 2D object masks from 3D pseudo labels are used to train a pixel-wise feature extractor through contrastive learning. During online inference, a clustering method uses the learned features to cluster foreground pixels into object segments. Experiments highlight the method's effectiveness on both real and synthetic video datasets, which include cluttered scenes of tabletop objects. The proposed method outperforms existing unsupervised methods for object segmentation by a large margin.
Structure-aware Protein Self-supervised Learning
Chen, Can, Zhou, Jingbo, Wang, Fan, Liu, Xue, Dou, Dejing
Protein representation learning methods have shown great potential to yield useful representation for many downstream tasks, especially on protein classification. Moreover, a few recent studies have shown great promise in addressing insufficient labels of proteins with self-supervised learning methods. However, existing protein language models are usually pretrained on protein sequences without considering the important protein structural information. To this end, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme. Experiments on several supervised downstream tasks verify the effectiveness of our proposed method.The code of the proposed method is available in \url{https://github.com/GGchen1997/STEPS_Bioinformatics}.
EMP-SSL: Towards Self-Supervised Learning in One Training Epoch
Tong, Shengbang, Chen, Yubei, Ma, Yi, Lecun, Yann
Recently, self-supervised learning (SSL) has achieved tremendous success in learning image representation. Despite the empirical success, most self-supervised learning methods are rather "inefficient" learners, typically taking hundreds of training epochs to fully converge. In this work, we show that the key towards efficient self-supervised learning is to increase the number of crops from each image instance. Leveraging one of the state-of-the-art SSL method, we introduce a simplistic form of self-supervised learning method called Extreme-Multi-Patch Self-Supervised-Learning (EMP-SSL) that does not rely on many heuristic techniques for SSL such as weight sharing between the branches, feature-wise normalization, output quantization, and stop gradient, etc, and reduces the training epochs by two orders of magnitude. We show that the proposed method is able to converge to 85.1% on CIFAR-10, 58.5% on CIFAR-100, 38.1% on Tiny ImageNet and 58.5% on ImageNet-100 in just one epoch. Furthermore, the proposed method achieves 91.5% on CIFAR-10, 70.1% on CIFAR-100, 51.5% on Tiny ImageNet and 78.9% on ImageNet-100 with linear probing in less than ten training epochs. In addition, we show that EMP-SSL shows significantly better transferability to out-of-domain datasets compared to baseline SSL methods. We will release the code in https://github.com/tsb0601/EMP-SSL.
A Theoretical Study of Inductive Biases in Contrastive Learning
Understanding self-supervised learning is important but challenging. Previous theoretical works study the role of pretraining losses, and view neural networks as general black boxes. However, the recent work of Saunshi et al. argues that the model architecture -- a component largely ignored by previous works -- also has significant influences on the downstream performance of self-supervised learning. In this work, we provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class. In particular, we focus on contrastive learning -- a popular self-supervised learning method that is widely used in the vision domain. We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture, but ignore many other clustering structures in the data distribution. As a result, our theory can capture the more realistic setting where contrastive representations have much lower dimensionality than the number of clusters in the data distribution. We instantiate our theory on several synthetic data distributions, and provide empirical evidence to support the theory.
Supervised Learning of Probability Distributions by Neural Networks
In the past thirty years many researchers have studied the question of supervised learning in'neural'-like networks. Recently a learning algorithm called'back propagation H - 4 or the'general(cid:173) ized delta-rule' has been applied to numerous problems including the mapping of text to phonemes 5, the diagnosis of illnesses6 and the classification of sonar targets 7 • In these applications, it would often be natural to consider imperfect, or probabilistic informa(cid:173) tion.
Network Generality, Training Required, and Precision Required
We show how to estimate (1) the number of functions that can be implemented by a particular network architecture, (2) how much analog precision is needed in the con(cid:173) nections in the network, and (3) the number of training examples the network must see before it can be expected to form reliable generalizations. Consider the following objectives: First, the network should be very powerful and ver(cid:173) satile, i.e., it should implement any function (truth table) you like, and secondly, it should learn easily, forming meaningful generalizations from a small number of training examples. Well, it is information-theoretically impossible to create such a network. We will present here a simplified argument; a more complete and sophisticated version can be found in Denker et al. (1987). It is customary to regard learning as a dynamical process: adjusting the weights (etc.) in a single network.
Scaling and Generalization in Neural Networks: A Case Study
The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rig(cid:173) orous results. In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separa(cid:173) ble functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a num(cid:173) ber of useful properties.
Training a 3-Node Neural Network is NP-Complete
We consider a 2-layer, 3-node, n-input neural network whose nodes compute linear threshold functions of their inputs. We show that it is NP-complete to decide whether there exist weights and thresholds for the three nodes of this network so that it will produce output con(cid:173) sistent with a given set of training examples. We extend the result to other simple networks. This result suggests that those looking for perfect training algorithms cannot escape inherent computational difficulties just by considering only simple or very regular networks. It also suggests the importance, given a training problem, of finding an appropriate network and input encoding for that problem.