Inductive Learning
Affinity Uncertainty-based Hard Negative Mining in Graph Contrastive Learning
Niu, Chaoxi, Pang, Guansong, Chen, Ling
Hard negative mining has shown effective in enhancing self-supervised contrastive learning (CL) on diverse data types, including graph CL (GCL). The existing hardness-aware CL methods typically treat negative instances that are most similar to the anchor instance as hard negatives, which helps improve the CL performance, especially on image data. However, this approach often fails to identify the hard negatives but leads to many false negatives on graph data. This is mainly due to that the learned graph representations are not sufficiently discriminative due to oversmooth representations and/or non-independent and identically distributed (non-i.i.d.) issues in graph data. To tackle this problem, this article proposes a novel approach that builds a discriminative model on collective affinity information (i.e., two sets of pairwise affinities between the negative instances and the anchor instance) to mine hard negatives in GCL. In particular, the proposed approach evaluates how confident/uncertain the discriminative model is about the affinity of each negative instance to an anchor instance to determine its hardness weight relative to the anchor instance. This uncertainty information is then incorporated into the existing GCL loss functions via a weighting term to enhance their performance. The enhanced GCL is theoretically grounded that the resulting GCL loss is equivalent to a triplet loss with an adaptive margin being exponentially proportional to the learned uncertainty of each negative instance. Extensive experiments on ten graph datasets show that our approach does the following: 1) consistently enhances different state-of-the-art (SOTA) GCL methods in both graph and node classification tasks and 2) significantly improves their robustness against adversarial attacks. Code is available at https://github.com/mala-lab/AUGCL.
Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences
Skalski, Piotr, Sutton, David, Burrell, Stuart, Perez, Iker, Wong, Jason
Their Machine learning models underpin many modern financial systems rapid success has been in no small part due to the development of for use cases such as fraud detection and churn prediction. Most self-supervised learning (SSL) methods such as autoregressive [27] are based on supervised learning with hand-engineered features, and masked [13] language modelling which have allowed models which relies heavily on the availability of labelled data. Large selfsupervised to learn contextual representations of input tokens without relying generative models have shown tremendous success on labels. in natural language processing and computer vision, yet so far While these methods have already been successfully used with they haven't been adapted to multivariate time series of financial different modalities such as natural language [4, 11, 22, 27, 28], transactions. In this paper, we present a generative pretraining computer vision [26, 30], audio [3, 12], and tabular data [1, 20, 31] method that can be used to obtain contextualised embeddings of there has been little work to adapt them to the case of multivariate financial transactions. Benchmarks on public datasets demonstrate time series data. One example of such data modality of particular that it outperforms state-of-the-art self-supervised methods on a interest in this work is streams of financial transactions - sequences range of downstream tasks. We additionally perform large-scale of events representing transfers of funds between two entities. Each pretraining of an embedding model using a corpus of data from 180 event can be described by a set of numerical or categorical features, issuing banks containing 5.1 billion transactions and apply it to the such as the timestamp, card number, transaction amount, merchant card fraud detection problem on hold-out datasets.
Balancing Continual Learning and Fine-tuning for Human Activity Recognition
Tang, Chi Ian, Qendro, Lorena, Spathis, Dimitris, Kawsar, Fahim, Mathur, Akhil, Mascolo, Cecilia
Wearable-based Human Activity Recognition (HAR) is a key task in human-centric machine learning due to its fundamental understanding of human behaviours. Due to the dynamic nature of human behaviours, continual learning promises HAR systems that are tailored to users' needs. However, because of the difficulty in collecting labelled data with wearable sensors, existing approaches that focus on supervised continual learning have limited applicability, while unsupervised continual learning methods only handle representation learning while delaying classifier training to a later stage. This work explores the adoption and adaptation of CaSSLe, a continual self-supervised learning model, and Kaizen, a semi-supervised continual learning model that balances representation learning and down-stream classification, for the task of wearable-based HAR. These schemes re-purpose contrastive learning for knowledge retention and, Kaizen combines that with self-training in a unified scheme that can leverage unlabelled and labelled data for continual learning. In addition to comparing state-of-the-art self-supervised continual learning schemes, we further investigated the importance of different loss terms and explored the trade-off between knowledge retention and learning from new tasks. In particular, our extensive evaluation demonstrated that the use of a weighting factor that reflects the ratio between learned and new classes achieves the best overall trade-off in continual learning.
Supervision by Denoising for Medical Image Segmentation
Young, Sean I., Dalca, Adrian V., Ferrante, Enzo, Golland, Polina, Metzler, Christopher A., Fischl, Bruce, Iglesias, Juan Eugenio
Abstract--Learning-based image reconstruction models, such as those based on the U-Net, require a large set of labeled images if good generalization is to be guaranteed. In some imaging domains, however, labeled data with pixel-or voxel-level label accuracy are scarce due to the cost of acquiring them. This problem is exacerbated further in domains like medical imaging, where there is no single ground truth label, resulting in large amounts of repeat variability in the labels. Therefore, training reconstruction networks to generalize better by learning from both labeled and unlabeled examples (called semi-supervised learning) is problem of practical and theoretical interest. However, traditional semi-supervised learning methods for image reconstruction often necessitate handcrafting a differentiable regularizer specific to some given imaging problem, which can be extremely time-consuming. In this work, we propose "supervision by denoising" (SUD), a framework to supervise reconstruction models using their own denoised output as labels. SUD unifies stochastic averaging and spatial denoising techniques under a spatio-temporal denoising framework and alternates denoising and model weight update steps in an optimization framework for semi-supervision. As example applications, we apply SUD to two problems from biomedical imaging--anatomical brain reconstruction (3D) and cortical parcellation (2D)--to demonstrate a significant improvement in reconstruction over supervised-only and ensembling baselines. While reconstruction models such as those based on the reconstruction network has proved extremely useful for U-Net [5] typically outperform handcrafted models in many imposing topological or spatial priors on the reconstruction imaging problems, they can involve millions of parameters [18], [19] and semi-supervised learning (SSL). SSL methods and, as a result, have a tendency to overfit training data and based on regularization suffer neither from limited diversity generalize poorly to previously unseen images at test time-- of augmented data nor domain gaps resulting from training a problem also exacerbated by distribution shift [6].
GEqO: ML-Accelerated Semantic Equivalence Detection
Haynes, Brandon, Alotaibi, Rana, Pavlenko, Anna, Leeka, Jyoti, Jindal, Alekh, Tian, Yuanyuan
Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource utilization and reducing job execution time. Detecting common computation is the first and key step for reducing this computational redundancy. However, detecting equivalence on large-scale analytics engines requires efficient and scalable solutions that are fully automated. In addition, to maximize computation reuse, equivalence needs to be detected at the semantic level instead of just the syntactic level (i.e., the ability to detect semantic equivalence of seemingly different-looking queries). Unfortunately, existing solutions fall short of satisfying these requirements. In this paper, we take a major step towards filling this gap by proposing GEqO, a portable and lightweight machine-learning-based framework for efficiently identifying semantically equivalent computations at scale. GEqO introduces two machine-learning-based filters that quickly prune out nonequivalent subexpressions and employs a semi-supervised learning feedback loop to iteratively improve its model with an intelligent sampling mechanism. Further, with its novel database-agnostic featurization method, GEqO can transfer the learning from one workload and database to another. Our extensive empirical evaluation shows that, on TPC-DS-like queries, GEqO yields significant performance gains-up to 200x faster than automated verifiers-and finds up to 2x more equivalences than optimizer and signature-based equivalence detection approaches.
Kernel Density Estimation for Multiclass Quantification
Moreo, Alejandro, González, Pablo, del Coz, Juan José
Quantification (variously called learning to quantify or class prevalence estimation) is the area of supervised machine learning concerned with estimating the percentages of instances from a population (hereafter, a bag of examples) belonging to each of the classes of interest [González et al., 2017, Esuli et al., 2023]. Quantification finds applications in many disciplines, like the social sciences, epidemiology, or market research, in which the interest lies at the aggregate level, i.e., in which inferring characteristics of the single individual (e.g., via classification, or via regression) is of little concern since knowing group-level information is all we need. Despite the fact that binary quantification (i.e., the setting in which the classes of interest are positive vs. negative) has been, by far, the most studied scenario in the quantification literature [Card and Smith, 2018, Forman, 2008, Bella et al., 2010, Esuli and Sebastiani, 2015, Hassan et al., 2020, Moreo and Sebastiani, 2021], the truth is that many of the applications of quantification naturally arise in the multiclass regime, i.e., in cases in which there are more than two mutually exclusive classes. Examples of multiclass settings are ubiquitous, and may include the allocation of human resources to different departments in a company [Forman, 2005], the analysis of different phytoplankton species that could exist in a water sample [González et al., 2019], or the analysis of the various causes of death studied in verbal autopsies [King and Lu, 2008], to name a few. A more concrete example could consist of providing answers to questions like: "What is the percentage of tweets conveying positive, neutral, and negative opinions concerning a specific hashtag?"
Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training
Qin, Jiuming, Liu, Che, Cheng, Sibo, Guo, Yike, Arcucci, Rossella
Modern healthcare often utilises radiographic images alongside textual reports for diagnostics, encouraging the use of Vision-Language Self-Supervised Learning (VL-SSL) with large pre-trained models to learn versatile medical vision representations. However, most existing VL-SSL frameworks are trained end-to-end, which is computation-heavy and can lose vital prior information embedded in pre-trained encoders. To address both issues, we introduce the backbone-agnostic Adaptor framework, which preserves medical knowledge in pre-trained image and text encoders by keeping them frozen, and employs a lightweight Adaptor module for cross-modal learning. Experiments on medical image classification and segmentation tasks across three datasets reveal that our framework delivers competitive performance while cutting trainable parameters by over 90% compared to current pre-training approaches. Notably, when fine-tuned with just 1% of data, Adaptor outperforms several Transformer-based methods trained on full datasets in medical image segmentation.
SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer
Rojas-Gomez, Renan A., Singhal, Karan, Etemad, Ali, Bijamov, Alex, Morningstar, Warren R., Mansfield, Philip Andrew
Self-supervised learning relies heavily on data augmentation to extract meaningful representations from unlabeled images. While existing state-of-the-art augmentation pipelines incorporate a wide range of primitive transformations, these often disregard natural image structure. Thus, augmented samples can exhibit degraded semantic information and low stylistic diversity, affecting downstream performance of self-supervised representations. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Transfer. The method decouples semantic and stylistic attributes in images and applies transformations exclusively to the style while preserving content, generating diverse augmented samples that better retain their semantic properties. Experimental results show our technique achieves a top-1 classification performance improvement of more than 2% on ImageNet compared to the well-established MoCo v2. Our experiments indicate that decoupling style from content information and transferring style across datasets to diversify augmentations can significantly improve downstream performance of self-supervised representations. Data labelling is a challenging and expensive process, which often serves as a barrier to build machine learning models to solve real-world problems. Self-supervised learning (SSL) is an emerging machine learning paradigm that helps to alleviate the challenges of data labelling, by using large corpora of unlabeled data to pre-train models to learn robust and general representations. These representations can be efficiently transferred to downstream tasks, resulting in performant models which can be constructed without access to large pools of labeled data. SSL methods have shown promising results in recent years, matching and in some cases exceeding the performance of bespoke supervised models with small amounts of labelled data. Given the lack of labels, SSL relies on pretext tasks, i.e., predefined tasks where pseudo-labels can be generated. Some examples include contrastive learning (Chen et al., 2020a; He et al., 2020), clustering (Caron et al., 2021; 2020; Assran et al., 2022), and generative modeling (He et al., 2022; Devlin et al., 2018). Many of these pretext tasks involve training the model to distinguish between different views of the same input and inputs corresponding to different samples. For these tasks, the way input data is augmented is crucial for the network to learn useful invariances and extract robust representations (Chen et al., 2020a). While state-of-the-art augmentations incorporate a wide range of primitive color, spectral and spatial transformations, they often disregard the natural structure of an image.
Boosting Transformer's Robustness and Efficacy in PPG Signal Artifact Detection with Self-Supervised Learning
Recent research at CHU Sainte Justine's Pediatric Critical Care Unit (PICU) has revealed that traditional machine learning methods, such as semi-supervised label propagation and K-nearest neighbors, outperform Transformer-based models in artifact detection from PPG signals, mainly when data is limited. This study addresses the underutilization of abundant unlabeled data by employing self-supervised learning (SSL) to extract latent features from these data, followed by fine-tuning on labeled data. Our experiments demonstrate that SSL significantly enhances the Transformer model's ability to learn representations, improving its robustness in artifact classification tasks. Among various SSL techniques, including masking, contrastive learning, and DINO (self-distillation with no labels)-contrastive learning exhibited the most stable and superior performance in small PPG datasets. Further, we delve into optimizing contrastive loss functions, which are crucial for contrastive SSL. Inspired by InfoNCE, we introduce a novel contrastive loss function that facilitates smoother training and better convergence, thereby enhancing performance in artifact classification. In summary, this study establishes the efficacy of SSL in leveraging unlabeled data, particularly in enhancing the capabilities of the Transformer model. This approach holds promise for broader applications in PICU environments, where annotated data is often limited.
Hard View Selection for Self-Supervised Learning
Ferreira, Fabio, Rapant, Ivo, Hutter, Frank
Many Self-Supervised Learning (SSL) methods train their models to be invariant to different "views" of an image and considerable efforts were directed towards improving pre-text tasks, architectures, or robustness. However, most SSL methods remain reliant on the random sampling of operations within the image augmentation pipeline, such as the random resized crop operation. We argue that the role of the view generation and its effect on performance has so far received insufficient attention. To address this, we propose an easy, learning-free, yet powerful Hard View Selection (HVS) strategy designed to extend the random view generation to expose the pretrained model to harder samples during SSL training. It encompasses the following iterative steps: 1) randomly sample multiple views and create pairs of two views, 2) run forward passes for each view pair on the currently trained model, 3) adversarially select the pair yielding the worst loss depending on the current model state, and 4) run the backward pass with the selected pair. As a result, HVS consistently achieves accuracy improvements between 0.91% and 1.93% on ImageNet linear evaluation and similar improvements on transfer tasks across DINO, SimSiam, iBOT and SimCLR. We provide studies to shed light on the inner workings and show that, by naively using smaller resolution images for the selection step, we can significantly reduce the computational overhead while retaining performance. Surprisingly, even when accounting for the computational overhead incurred by HVS, we achieve performance gains between 0.52% and 1.02% and closely rival the 800-epoch DINO pretraining with only 300 epochs. Various approaches to learn effective and generalizable visual representations in Self-Supervised Learning (SSL) exist. Such views are generated by applying a sequence of (randomly sampled) image transformations and are usually composed of geometric (cropping, rotation, etc.) and appearance (color distortion, blurring, etc.) transformations. A body of literature (Chen et al., 2020a; Wu et al., 2020; Purushwalkam & Gupta, 2020; Wagner et al., 2022; Tian et al., 2020b) has illuminated the effects of image views on representation learning and identified random resized crop (RRC) transformation, which However, despite this finding and to our best knowledge, little research has gone into identifying more effective ways for selecting or generating views to improve performance.