Goto

Collaborating Authors

 Inductive Learning


Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods

arXiv.org Machine Learning

Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. Although SSL has recently reached a milestone: outperforming supervised methods in many modalities\dots the theoretical foundations are limited, method-specific, and fail to provide principled design guidelines to practitioners. In this paper, we propose a unifying framework under the helm of spectral manifold learning to address those limitations. Through the course of this study, we will rigorously demonstrate that VICReg, SimCLR, BarlowTwins et al. correspond to eponymous spectral methods such as Laplacian Eigenmaps, Multidimensional Scaling et al. This unification will then allow us to obtain (i) the closed-form optimal representation for each method, (ii) the closed-form optimal network parameters in the linear regime for each method, (iii) the impact of the pairwise relations used during training on each of those quantities and on downstream task performances, and most importantly, (iv) the first theoretical bridge between contrastive and non-contrastive methods towards global and local spectral embedding methods respectively, hinting at the benefits and limitations of each. For example, (i) if the pairwise relation is aligned with the downstream task, any SSL method can be employed successfully and will recover the supervised method, but in the low data regime, VICReg's invariance hyper-parameter should be high; (ii) if the pairwise relation is misaligned with the downstream task, VICReg with small invariance hyper-parameter should be preferred over SimCLR or BarlowTwins.


Multi-Mask Self-Supervised Learning for Physics-Guided Neural Networks in Highly Accelerated MRI

arXiv.org Artificial Intelligence

Self-supervised learning has shown great promise due to its capability to train deep learning MRI reconstruction methods without fully-sampled data. Current self-supervised learning methods for physics-guided reconstruction networks split acquired undersampled data into two disjoint sets, where one is used for data consistency (DC) in the unrolled network and the other to define the training loss. In this study, we propose an improved self-supervised learning strategy that more efficiently uses the acquired data to train a physics-guided reconstruction network without a database of fully-sampled data. The proposed multi-mask self-supervised learning via data undersampling (SSDU) applies a hold-out masking operation on acquired measurements to split it into multiple pairs of disjoint sets for each training sample, while using one of these pairs for DC units and the other for defining loss, thereby more efficiently using the undersampled data. Multi-mask SSDU is applied on fully-sampled 3D knee and prospectively undersampled 3D brain MRI datasets, for various acceleration rates and patterns, and compared to CG-SENSE and single-mask SSDU DL-MRI, as well as supervised DL-MRI when fully-sampled data is available. Results on knee MRI show that the proposed multi-mask SSDU outperforms SSDU and performs closely with supervised DL-MRI. A clinical reader study further ranks the multi-mask SSDU higher than supervised DL-MRI in terms of SNR and aliasing artifacts. Results on brain MRI show that multi-mask SSDU achieves better reconstruction quality compared to SSDU. Reader study demonstrates that multi-mask SSDU at R=8 significantly improves reconstruction compared to single-mask SSDU at R=8, as well as CG-SENSE at R=2.


Modeling Disagreement in Automatic Data Labelling for Semi-Supervised Learning in Clinical Natural Language Processing

arXiv.org Machine Learning

Computational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which has been labelled automatically (self-supervised mode) and tend to overfit. In this work, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain. We demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of 3 uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance.


Semi-Supervised Learning

#artificialintelligence

Image classification is the most common computer vision problem where an algorithm process an image and classifies the classes. This technique extended with object detection algorithms, where it uses localization with the classification. In object detection methods object is localized by a bounding box, where the bounding box is represented by four value points according to the pixels in an image. If you are trying to train an object detection model with custom data, human resources are required to annotate enormous amounts of data manually. Consider a large amount of image data set that need to train on a model, and manually labeling all of this data ourselves may take a long time and logistically difficult.


MCD: Marginal Contrastive Discrimination for conditional density estimation

arXiv.org Machine Learning

We consider the problem of conditional density estimation, which is a major topic of interest in the fields of statistical and machine learning. Our method, called Marginal Contrastive Discrimination, MCD, reformulates the conditional density function into two factors, the marginal density function of the target variable and a ratio of density functions which can be estimated through binary classification. Like noise-contrastive methods, MCD can leverage state-of-the-art supervised learning techniques to perform conditional density estimation, including neural networks. Our benchmark reveals that our method significantly outperforms in practice existing methods on most density models and regression datasets.


Generalization for multiclass classification with overparameterized linear models

arXiv.org Machine Learning

Via an overparameterized linear model with Gaussian features, we provide conditions for good generalization for multiclass classification of minimum-norm interpolating solutions in an asymptotic setting where both the number of underlying features and the number of classes scale with the number of training points. The survival/contamination analysis framework for understanding the behavior of overparameterized learning problems is adapted to this setting, revealing that multiclass classification qualitatively behaves like binary classification in that, as long as there are not too many classes (made precise in the paper), it is possible to generalize well even in some settings where the corresponding regression tasks would not generalize. Besides various technical challenges, it turns out that the key difference from the binary classification setting is that there are relatively fewer positive training examples of each class in the multiclass setting as the number of classes increases, making the multiclass problem "harder" than the binary one.


Self-Supervised Learning And Its Applications - AI Summary

#artificialintelligence

The focus was largely on supervised learning methods that require huge amounts of labeled data to train systems for specific use cases. Bidirectional Encoder Representations from Transformers (BERT) a paper published by researchers at the Google AI team has become a gold standard when it comes to several NLP tasks such as Natural Language Inference (MNLI), Question Answering (SQuAD), and more. To make BERT handle a variety of downstream tasks, input representation is able to unambiguously represent a pair of sentences that are packed together in a single sequence. While autoencoding models like BERT utilize self-supervised learning for tasks like sentence classification (next or not), another application of self-supervised approaches lies in the domain of text generation. The inputs are passed through our pre-trained model to obtain the final transformer block's activation hm l, which is then fed into an added linear output layer with parameters W y to predict y: Translation Language Modelling (TLM): a new addition and an extension of MLM, where instead of considering monolingual text streams, parallel sentences are concatenated as illustrated in the following image.


GitHub - jason718/awesome-self-supervised-learning: A curated list of awesome self-supervised methods

#artificialintelligence

Self-Supervised Learning has become an exciting direction in AI community. Predicting What You Already Know Helps: Provable Self-Supervised Learning. For self-supervised learning, Rationality implies generalization, provably. Can Pretext-Based Self-Supervised Learning Be Boosted by Downstream Data? FAIR Self-Supervision Benchmark [pdf] [repo]: various benchmark (and legacy) tasks for evaluating quality of visual representations learned by various self-supervision approaches.


Self-Supervised Learning and Its Applications - neptune.ai

#artificialintelligence

In the past decade, the research and development in AI have skyrocketed, especially after the results of the ImageNet competition in 2012. The focus was largely on supervised learning methods that require huge amounts of labeled data to train systems for specific use cases. In this article, we will explore Self Supervised Learning (SSL) – a hot research topic in a machine learning community. Self-supervised learning (SSL) is an evolving machine learning technique poised to solve the challenges posed by the over-dependence of labeled data. For many years, building intelligent systems using machine learning methods has been largely dependent on good quality labeled data. Consequently, the cost of high-quality annotated data is a major bottleneck in the overall training process.


Additive Logistic Mechanism for Privacy-Preserving Self-Supervised Learning

arXiv.org Machine Learning

We study the privacy risks that are associated with training a neural network's weights with self-supervised learning algorithms. Through empirical evidence, we show that the fine-tuning stage, in which the network weights are updated with an informative and often private dataset, is vulnerable to privacy attacks. To address the vulnerabilities, we design a post-training privacy-protection algorithm that adds noise to the fine-tuned weights and propose a novel differential privacy mechanism that samples noise from the logistic distribution. Compared to the two conventional additive noise mechanisms, namely the Laplace and the Gaussian mechanisms, the proposed mechanism uses a bell-shaped distribution that resembles the distribution of the Gaussian mechanism, and it satisfies pure $\epsilon$-differential privacy similar to the Laplace mechanism. We apply membership inference attacks on both unprotected and protected models to quantify the trade-off between the models' privacy and performance. We show that the proposed protection algorithm can effectively reduce the attack accuracy to roughly 50\%-equivalent to random guessing-while maintaining a performance loss below 5\%.