Inductive Learning
Variance-Covariance Regularization Improves Representation Learning
Zhu, Jiachen, Shwartz-Ziv, Ravid, Chen, Yubei, LeCun, Yann
Transfer learning has emerged as a key approach in the machine learning domain, enabling the application of knowledge derived from one domain to improve performance on subsequent tasks. Given the often limited information about these subsequent tasks, a strong transfer learning approach calls for the model to capture a diverse range of features during the initial pretraining stage. However, recent research suggests that, without sufficient regularization, the network tends to concentrate on features that primarily reduce the pretraining loss function. This tendency can result in inadequate feature learning and impaired generalization capability for target tasks. To address this issue, we propose Variance-Covariance Regularization (VCR), a regularization technique aimed at fostering diversity in the learned network features. Drawing inspiration from recent advancements in the self-supervised learning approach, our approach promotes learned representations that exhibit high variance and minimal covariance, thus preventing the network from focusing solely on loss-reducing features. We empirically validate the efficacy of our method through comprehensive experiments coupled with in-depth analytical studies on the learned representations. In addition, we develop an efficient implementation strategy that assures minimal computational overhead associated with our method. Our results indicate that VCR is a powerful and efficient method for enhancing transfer learning performance for both supervised learning and self-supervised learning, opening new possibilities for future research in this domain.
Adversarial Resilience in Sequential Prediction via Abstention
Goel, Surbhi, Hanneke, Steve, Moran, Shay, Shetty, Abhishek
We study the problem of sequential prediction in the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions. This is undesirable in many high-stakes applications such as medical recommendations, where abstaining from predictions on adversarial examples is preferable to misclassification. On the other hand, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice. To capture this motivation, we propose a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we design a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we design a learner for VC dimension~1 classes, which works even in the absence of access to the marginal distribution. Our key technical contribution is a novel measure for quantifying uncertainty for learning VC classes, which may be of independent interest.
Judge in Trump classified documents case sets preliminary trial date for Aug. 14
Former President Donald Trump defends himself against allegations he mishandled classified documents on'Special Report.' Former President Donald Trump's trial on 37 federal felony counts is poised to begin on August 14, a judge announced Tuesday. Federal Judge Aileen Cannon announced the preliminary court date Tuesday, but the final date for Trump's trial is likely to change as the former president's legal team is expected to request a delay. Trump has vowed to continue his 2024 presidential campaign despite his legal jeopardy. Trump is accused of 37 counts, including willful retention of national defense information, conspiracy to obstruct justice and making false statements.
Spectral Augmentation for Self-Supervised Learning on Graphs
Lin, Lu, Chen, Jinghui, Wang, Hongning
Graph contrastive learning (GCL), as an emerging self-supervised learning technique on graphs, aims to learn representations via instance discrimination. Its performance heavily relies on graph augmentation to reflect invariant patterns that are robust to small perturbations; yet it still remains unclear about what graph invariance GCL should capture. Recent studies mainly perform topology augmentations in a uniformly random manner in the spatial domain, ignoring its influence on the intrinsic structural properties embedded in the spectral domain. In this work, we aim to find a principled way for topology augmentations by exploring the invariance of graphs from the spectral perspective. We develop spectral augmentation which guides topology augmentations by maximizing the spectral change. Extensive experiments on both graph and node classification tasks demonstrate the effectiveness of our method in unsupervised learning, as well as the generalization capability in transfer learning and the robustness property under adversarial attacks. Graph neural networks (GNNs) (Kipf & Welling, 2017; Veliฤkoviฤ et al., 2018; Xu et al., 2019) have advanced graph representation learning in a (semi-)supervised manner, yet it requires supervised labels and may fail to generalize (Rong et al., 2020). To obtain more generalizable and transferable representations, the self-supervised learning (SSL) paradigm emerges which enables GNNs to learn from pretext tasks constructed on unlabeled graph data (Hu et al., 2020c;b; You et al., 2020b; Jin et al., 2020a). As a state-of-the-art SSL technique, graph contrastive learning (GCL) has attracted the most attention due to its remarkable empirical performance (Velickovic et al., 2019; Zhu et al., 2020; Hassani & Khasahmadi, 2020; You et al., 2021; Suresh et al., 2021; Thakoor et al., 2021). A typical GCL method works by creating augmented views of the input graph and learning representations by contrasting related graph objects against unrelated ones. The goal of GCL is to capture graph invariance by maximizing the congruence between node or graph representations in augmented views. This makes graph augmentation one of the most critical designs in GCL, as it determines the effectiveness of the contrastive objective. However, despite that various GCL methods have been proposed, it remains a mystery about what graph invariance GCL should capture. Unlike images, which can be augmented to naturally highlight the main subject from the background, it is less obvious to design the most effective graph augmentation due to the complicated topology structure of diverse nature in different graphs (e.g., citation networks (Sen et al., 2008), social networks (Morris et al., 2020), chemical and biomedical molecules (Li et al., 2021; Hu et al., 2020b)), as discussed in the survey (Ding et al., 2022).
Comparative Study on Semi-supervised Learning Applied for Anomaly Detection in Hydraulic Condition Monitoring System
Dong, Yongqi, Chen, Kejia, Ma, Zhiyuan
Condition-based maintenance is becoming increasingly important in hydraulic systems. However, anomaly detection for these systems remains challenging, especially since that anomalous data is scarce and labeling such data is tedious and even dangerous. Therefore, it is advisable to make use of unsupervised or semi-supervised methods, especially for semi-supervised learning which utilizes unsupervised learning as a feature extraction mechanism to aid the supervised part when only a small number of labels are available. This study systematically compares semi-supervised learning methods applied for anomaly detection in hydraulic condition monitoring systems. Firstly, thorough data analysis and feature learning were carried out to understand the open-sourced hydraulic condition monitoring dataset. Then, various methods were implemented and evaluated including traditional stand-alone semi-supervised learning models (e.g., one-class SVM, Robust Covariance), ensemble models (e.g., Isolation Forest), and deep neural network based models (e.g., autoencoder, Hierarchical Extreme Learning Machine (HELM)). Typically, this study customized and implemented an extreme learning machine based semi-supervised HELM model and verified its superiority over other semi-supervised methods. Extensive experiments show that the customized HELM model obtained state-of-the-art performance with the highest accuracy (99.5%), the lowest false positive rate (0.015), and the best F1-score (0.985) beating other semi-supervised methods.
INoD: Injected Noise Discriminator for Self-Supervised Representation Learning in Agricultural Fields
Hindel, Julia, Gosala, Nikhil, Bregler, Kevin, Valada, Abhinav
Perception datasets for agriculture are limited both in quantity and diversity which hinders effective training of supervised learning approaches. Self-supervised learning techniques alleviate this problem, however, existing methods are not optimized for dense prediction tasks in agriculture domains which results in degraded performance. In this work, we address this limitation with our proposed Injected Noise Discriminator (INoD) which exploits principles of feature replacement and dataset discrimination for self-supervised representation learning. INoD interleaves feature maps from two disjoint datasets during their convolutional encoding and predicts the dataset affiliation of the resultant feature map as a pretext task. Our approach enables the network to learn unequivocal representations of objects seen in one dataset while observing them in conjunction with similar features from the disjoint dataset. This allows the network to reason about higher-level semantics of the entailed objects, thus improving its performance on various downstream tasks. Additionally, we introduce the novel Fraunhofer Potato 2022 dataset consisting of over 16,800 images for object detection in potato fields. Extensive evaluations of our proposed INoD pretraining strategy for the tasks of object detection, semantic segmentation, and instance segmentation on the Sugar Beets 2016 and our potato dataset demonstrate that it achieves state-of-the-art performance.
Self-supervised learning of Split Invariant Equivariant representations
Garrido, Quentin, Najman, Laurent, Lecun, Yann
Recent progress has been made towards learning invariant or equivariant representations with self-supervised learning. While invariant methods are evaluated on large scale datasets, equivariant ones are evaluated in smaller, more controlled, settings. We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks. We start by introducing a dataset called 3DIEBench, consisting of renderings from 3D models over 55 classes and more than 2.5 million images where we have full control on the transformations applied to the objects. We further introduce a predictor architecture based on hypernetworks to learn equivariant representations with no possible collapse to invariance. We introduce SIE (Split Invariant-Equivariant) which combines the hypernetwork-based predictor with representations split in two parts, one invariant, the other equivariant, to learn richer representations. We demonstrate significant performance gains over existing methods on equivariance related tasks from both a qualitative and quantitative point of view. We further analyze our introduced predictor and show how it steers the learned latent space. We hope that both our introduced dataset and approach will enable learning richer representations without supervision in more complex scenarios. Code and data are available at https://github.com/facebookresearch/SIE.
Efficient Video Representation Learning via Motion-Aware Token Selection
Hwang, Sunil, Yoon, Jaehong, Lee, Youngwan, Hwang, Sung Ju
Recently emerged Masked Video Modeling techniques demonstrated their potential by significantly outperforming previous methods in self-supervised learning for video. However, they require an excessive amount of computations and memory while predicting uninformative tokens/frames due to random masking strategies, requiring excessive computing power for training. (e.g., over 16 nodes with 128 NVIDIA A100 GPUs). To resolve this issue, we exploit the unequal information density among the patches in videos and propose a new token selection method, MATS: Motion-Aware Token Selection, that finds tokens containing rich motion features and drops uninformative ones during both self-supervised pre-training and fine-tuning. We further present an adaptive frame selection strategy that allows the model to focus on informative and causal frames with minimal redundancy. Our method significantly reduces computation and memory requirements, enabling the pre-training and fine-tuning on a single machine with 8 GPUs while achieving comparable performance to computation- and memory-heavy state-of-the-art methods on multiple benchmarks and on the uncurated Ego4D dataset. We are hopeful that the efficiency of our MATS will contribute to reducing the barrier to conducting further research on self-supervised learning for videos.
Self-supervised learning of hologram reconstruction using physics consistency
Huang, Luzhe, Chen, Hanlong, Liu, Tairan, Ozcan, Aydogan
The past decade has witnessed transformative applications of deep learning in various computational imaging, sensing and microscopy tasks. Due to the supervised learning schemes employed, these methods mostly depend on large-scale, diverse, and labeled training data. The acquisition and preparation of such training image datasets are often laborious and costly, also leading to biased estimation and limited generalization to new sample types. Here, we report a self-supervised learning model, termed GedankenNet, that eliminates the need for labeled or experimental training data, and demonstrate its effectiveness and superior generalization on hologram reconstruction tasks. Without prior knowledge about the sample types to be imaged, the self-supervised learning model was trained using a physics-consistency loss and artificial random images that are synthetically generated without any experiments or resemblance to real-world samples. After its self-supervised training, GedankenNet successfully generalized to experimental holograms of various unseen biological samples, reconstructing the phase and amplitude images of different types of objects using experimentally acquired test holograms. Without access to experimental data or knowledge of real samples of interest or their spatial features, GedankenNet's self-supervised learning achieved complex-valued image reconstructions that are consistent with the Maxwell's equations, and its output inference and object solutions accurately represent the wave propagation in free-space. GedankenNet framework also exhibits resilience to random, unknown perturbations in the physical forward model, including changes in the hologram distances, pixel size and illumination wavelength. This self-supervised learning of image reconstruction tasks creates new opportunities for various inverse problems in holography, microscopy and computational imaging fields.
Similarity-Aware Multimodal Prompt Learning for Fake News Detection
Jiang, Ye, Yu, Xiaomin, Wang, Yimin, Xu, Xiaoman, Song, Xingyi, Maynard, Diana
The standard paradigm for fake news detection mainly utilizes text information to model the truthfulness of news. However, the discourse of online fake news is typically subtle and it requires expert knowledge to use textual information to debunk fake news. Recently, studies focusing on multimodal fake news detection have outperformed text-only methods. Recent approaches utilizing the pre-trained model to extract unimodal features, or fine-tuning the pre-trained model directly, have become a new paradigm for detecting fake news. Again, this paradigm either requires a large number of training instances, or updates the entire set of pre-trained model parameters, making real-world fake news detection impractical. Furthermore, traditional multimodal methods fuse the cross-modal features directly without considering that the uncorrelated semantic representation might inject noise into the multimodal features. This paper proposes a Similarity-Aware Multimodal Prompt Learning (SAMPLE) framework. First, we incorporate prompt learning into multimodal fake news detection. Prompt learning, which only tunes prompts with a frozen language model, can reduce memory usage significantly and achieve comparable performances, compared with fine-tuning. We analyse three prompt templates with a soft verbalizer to detect fake news. In addition, we introduce the similarity-aware fusing method to adaptively fuse the intensity of multimodal representation and mitigate the noise injection via uncorrelated cross-modal features. For evaluation, SAMPLE surpasses the F1 and the accuracies of previous works on two benchmark multimodal datasets, demonstrating the effectiveness of the proposed method in detecting fake news. In addition, SAMPLE also is superior to other approaches regardless of few-shot and data-rich settings.