Inductive Learning
Neural Eigenfunctions Are Structured Representation Learners
Deng, Zhijie, Shi, Jiaxin, Zhang, Hao, Cui, Peng, Lu, Cewu, Zhu, Jun
This paper introduces a structured, adaptive-length deep representation called Neural Eigenmap. Unlike prior spectral methods such as Laplacian Eigenmap that operate in a nonparametric manner, Neural Eigenmap leverages NeuralEF (Deng et al., 2022) to parametrically model eigenfunctions using a neural network. We show that, when the eigenfunction is derived from positive relations in a data augmentation setup, applying NeuralEF results in an objective function that resembles those of popular self-supervised learning methods, with an additional symmetry-breaking property that leads to structured representations where features are ordered by importance. We demonstrate using such representations as adaptive-length codes in image retrieval systems. By truncation according to feature importance, our method requires up to 16 shorter representation length than leading self-supervised learning ones to achieve similar retrieval performance. We further apply our method to graph data and report strong results on a node representation learning benchmark with more than one million nodes. Automatically learning representations from unlabelled data is a long-standing challenge in machine learning. Often, the motivation is to map data to a vector space where the geometric distance reflects semantic closeness. This enables, for example, retrieving semantically related information via finding nearest neighbors, or discovering concepts with clustering. One can also pass such representations as inputs to supervised learning procedures, which removes the need for feature engineering.
Image Synthesis-based Late Stage Cancer Augmentation and Semi-Supervised Segmentation for MRI Rectal Cancer Staging
Sasuga, Saeko, Kudo, Akira, Kitamura, Yoshiro, Iizuka, Satoshi, Simo-Serra, Edgar, Hamabe, Atsushi, Ishii, Masayuki, Takemasa, Ichiro
Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from segmentation results. Generally, shortage of large and diverse dataset and high quality annotation are known to be the bottlenecks in computer aided diagnostics development. Regarding rectal cancer, advanced cancer images are very rare, and per-pixel annotation requires high radiologists' skill and time. Therefore, it is not feasible to collect comprehensive disease patterns in a training dataset. To tackle this, we propose two kinds of approaches of image synthesis-based late stage cancer augmentation and semi-supervised learning which is designed for T-stage prediction. In the image synthesis data augmentation approach, we generated advanced cancer images from labels. The real cancer labels were deformed to resemble advanced cancer labels by artificial cancer progress simulation. Next, we introduce a T-staging loss which enables us to train segmentation models from per-image T-stage labels. The loss works to keep inclusion/invasion relationships between rectum and cancer region consistent to the ground truth T-stage. The verification tests show that the proposed method obtains the best sensitivity (0.76) and specificity (0.80) in distinguishing between over T3 stage and underT2. In the ablation studies, our semi-supervised learning approach with the T-staging loss improved specificity by 0.13. Adding the image synthesis-based data augmentation improved the DICE score of invasion cancer area by 0.08 from baseline.
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
In this work, we introduce a self-supervised behavior cloning transformer for text games, which are challenging benchmarks for multi-step reasoning in virtual environments. Traditionally, Behavior Cloning Transformers excel in such tasks but rely on supervised training data. Our approach auto-generates training data by exploring trajectories (defined by common macro-action sequences) that lead to reward within the games, while determining the generality and utility of these trajectories by rapidly training small models then evaluating their performance on unseen development games. Through empirical analysis, we show our method consistently uncovers generalizable training data, achieving about 90\% performance of supervised systems across three benchmark text games.
Spectral Temporal Contrastive Learning
Morin, Sacha, Nath, Somjit, Kahou, Samira Ebrahimi, Wolf, Guy
Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, particularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.
Retrieval-Based Reconstruction For Time-series Contrastive Learning
Xu, Maxwell A., Moreno, Alexander, Wei, Hui, Marlin, Benjamin M., Rehg, James M.
The success of self-supervised contrastive learning hinges on identifying positive data pairs that, when pushed together in embedding space, encode useful information for subsequent downstream tasks. However, in time-series, this is challenging because creating positive pairs via augmentations may break the original semantic meaning. We hypothesize that if we can retrieve information from one subsequence to successfully reconstruct another subsequence, then they should form a positive pair. Harnessing this intuition, we introduce our novel approach: REtrieval-BAsed Reconstruction (REBAR) contrastive learning. First, we utilize a convolutional cross-attention architecture to calculate the REBAR error between two different time-series. Then, through validation experiments, we show that the REBAR error is a predictor of mutual class membership, justifying its usage as a positive/negative labeler. Finally, once integrated into a contrastive learning framework, our REBAR method can learn an embedding that achieves state-ofthe-art performance on downstream tasks across various modalities. Self-supervised learning uses the underlying structure within a dataset to learn rich and generalizable representations without labels, enabling fine-tuning on various downstream tasks. This reduces the need for large labeled datasets, which makes it an attractive approach for the time-series domain. With the advancement of sensor technologies, it is increasingly feasible to capture a large volume of data, but the cost of data labeling remains high. For example, in mobile health, acquiring labels requires burdensome real-time annotation (Rehg et al., 2017). Additionally, in medical applications such as ECG analysis, annotation is costly as it requires specialized medical expertise. Contrastive learning is a powerful self-supervised learning technique, which involves constructing and contrasting positive and negative pairs to yield an embedding space that captures semantic relationships.
Constrained Few-Shot Learning: Human-Like Low Sample Complexity Learning and Non-Episodic Text Classification
Few-shot learning (FSL) is an emergent paradigm of learning that attempts to learn to reason with low sample complexity to mimic the way humans learn, generalise and extrapolate from only a few seen examples. While FSL attempts to mimic these human characteristics, fundamentally, the task of FSL as conventionally formulated using meta-learning with episodic-based training does not in actuality align with how humans acquire and reason with knowledge. FSL with episodic training, while only requires $K$ instances of each test class, still requires a large number of labelled training instances from disjoint classes. In this paper, we introduce the novel task of constrained few-shot learning (CFSL), a special case of FSL where $M$, the number of instances of each training class is constrained such that $M \leq K$ thus applying a similar restriction during FSL training and test. We propose a method for CFSL leveraging Cat2Vec using a novel categorical contrastive loss inspired by cognitive theories such as fuzzy trace theory and prototype theory.
On the variants of SVM methods applied to GPR data to classify tack coat characteristics in French pavements: two experimental case studies
Andreoli, Grégory, Ihamouten, Amine, Nguyen, Mai Lan, Fargier, Yannick, Fauchard, Cyrille, Simonin, Jean-Michel, Buliuk, Viktoriia, Souriou, David, Dérobert, Xavier
Among the commonly used non-destructive techniques, the Ground Penetrating Radar (GPR) is one of the most widely adopted today for assessing pavement conditions in France. However, conventional radar systems and their forward processing methods have shown their limitations for the physical and geometrical characterization of very thin layers such as tack coats. However, the use of Machine Learning methods applied to GPR with an inverse approach showed that it was numerically possible to identify the tack coat characteristics despite masking effects due to low timefrequency resolution noted in the raw B-scans. Thus, we propose in this paper to apply the inverse approach based on Machine Learning, already validated in previous works on numerical data, on two experimental cases with different pavement structures. The first case corresponds to a validation on known pavement structures on the Gustave Eiffel University (Nantes, France) with its pavement fatigue carousel and the second case focuses on a new real road in Vend{\'e}e department (France). In both case studies, the performances of SVM/SVR methods showed the efficiency of supervised learning methods to classify and estimate the emulsion proportioning in the tack coats.
SSL-Auth: An Authentication Framework by Fragile Watermarking for Pre-trained Encoders in Self-supervised Learning
Li, Xiaobei, Yin, Changchun, Zhu, Liyue, Xu, Xiaogang, Fang, Liming, Wang, Run, Lin, Chenhao
Self-supervised learning (SSL), a paradigm harnessing unlabeled datasets to train robust encoders, has recently witnessed substantial success. These encoders serve as pivotal feature extractors for downstream tasks, demanding significant computational resources. Nevertheless, recent studies have shed light on vulnerabilities in pre-trained encoders, including backdoor and adversarial threats. Safeguarding the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders pose notable challenges in SSL. To bridge these gaps, we introduce SSL-Auth, the first authentication framework designed explicitly for pre-trained encoders. SSL-Auth leverages selected key samples and employs a well-trained generative network to reconstruct watermark information, thus affirming the integrity of the encoder without compromising its performance. By comparing the reconstruction outcomes of the key samples, we can identify any malicious alterations. Comprehensive evaluations conducted on a range of encoders and diverse downstream tasks demonstrate the effectiveness of our proposed SSL-Auth.
FroSSL: Frobenius Norm Minimization for Self-Supervised Learning
Skean, Oscar, Dhakal, Aayush, Jacobs, Nathan, Giraldo, Luis Gonzalo Sanchez
Self-supervised learning (SSL) is an increasingly popular paradigm for representation learning. Recent methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While dimension-contrastive methods converge to similar solutions as sample-contrastive methods, it can be empirically shown that some methods require more epochs of training to converge. Motivated by closing this divide, we present the objective function FroSSL which is both sample- and dimension-contrastive up to embedding normalization. FroSSL works by minimizing covariance Frobenius norms for avoiding collapse and minimizing mean-squared error for augmentation invariance. We show that FroSSL converges more quickly than a variety of other SSL methods and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on the CIFAR-10, CIFAR-100, STL-10, and ImageNet datasets.
Neural Priming for Sample-Efficient Adaptation
Wallingford, Matthew, Ramanujan, Vivek, Fang, Alex, Kusupati, Aditya, Mottaghi, Roozbeh, Kembhavi, Aniruddha, Schmidt, Ludwig, Farhadi, Ali
We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45% improvement in accuracy on ImageNet and 3.81% accuracy improvement on average across standard transfer learning benchmarks. Further, using Neural Priming at inference to adapt to distribution shift, we see a 1.41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming.