Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


Black-box Certification and Learning under Adversarial Perturbations

arXiv.org Machine Learning

We formally study the problem of classification under adversarial perturbations, both from the learner's perspective, and from the viewpoint of a third-party who aims at certifying the robustness of a given black-box classifier. We analyze a PAC-type framework of semi-supervised learning and identify possibility and impossibility results for proper learning of VC-classes in this setting. We further introduce and study a new setting of black-box certification under limited query budget. We analyze this for various classes of predictors and types of perturbation. We also consider the viewpoint of a black-box adversary that aims at finding adversarial examples, showing that the existence of an adversary with polynomial query complexity implies the existence of a robust learner with small sample complexity.


The Math Behind Generative Adversarial Networks Clearly Explained!

#artificialintelligence

Sign in to report inappropriate content. GAN is considered as one of the greatest breakthroughs in the field of Artificial Intelligence. In this video, I've tried my best to explain the core concepts of GANs.


How to tell if your model is over-fit using unlabeled data

#artificialintelligence

In many settings, unlabeled data is plentiful (think images, text, etc), while sufficient labeled data for supervised learning might be harder to obtain. In these situations, it can be difficult to determine how well the model will generalize. Most methods for assessing model performance rely on labeled data alone, e.g. Without enough labeled data these can be unreliable. Is there anything more we can learn about the model's ability to generalize from unlabeled data? In this article, I demonstrate how unlabeled data can frequently be used to bound test loss.


Generative Adversarial Networks (GANs) & Bayesian Networks

#artificialintelligence

Generative Adversarial Networks (GANs) software is software for producing forgeries and imitations of data (aka synthetic data, fake data). Human beings have been making fakes, with good or evil intent, of almost everything they possibly can, since the beginning of the human race. Thus, perhaps not too surprisingly, GAN software has been widely used since it was first proposed in this amazingly recent 2014 paper. To gauge how widely GAN software has been used so far, see, for example, this 2019 article entitled "18 Impressive Applications of Generative Adversarial Networks (GANs)" Sounds (voices, music,...), Images (realistic pictures, paintings, drawings, handwriting, ...), Text,etc. The forgeries can be tweaked so that they range from being very similar to the originals, to being whimsical exaggerations thereof.


GAN Papers to Read in 2020

#artificialintelligence

Generative Adversarial Networks (GANs) are one of the most innovative ideas proposed in this decade. At its core, GANs are an unsupervised model for generating new elements from a set of similar elements. For instance, to produce original face pictures given a collection of face images or create new tunes out of preexisting melodies. GANs have found applications for image, text, and sound generation, being at the core of technologies such as AI music, deep fakes, and content-aware image editing. Besides pure generation, GANs have also been applied to transforming images from one domain to another and as a means for style transfer.


The Basics of Machine Learning

#artificialintelligence

If you read all those books and looked a little bit around the internet you would probably be able to know what is machine learning but for me, I like the Arthur Samuel definition: " A field of study that gives computers the ability to learn without being explicitly programmed", In summary, machine learning is a sub-field of artificial intelligence, where we design systems that can learn from a provided data by training it. There are 4 types of machine learning but two of them are the most used, Supervised, and unsupervised learning. It is basically when you know the output so working with a set of labeled data, let's say a classic example is to classify email messages into spam and non-spam you basically feed the algorithm with the input and the output and based on it the algorithm would eventually predict a class out of a never seen data based on experience. Supervised machine learning includes two major processes: classification and regression. On the other hand, you have unsupervised learning, in which you let the algorithm learn on its own, formally let the algorithm find a hidden pattern in a load of data, there is no right or wrong answer, you are just training it and looking for the patterns it generates.


Google Brain's SimCLRv2 Achieves New SOTA in Semi-Supervised Learning

#artificialintelligence

Following on the February release of its contrastive learning framework SimCLR, the same team of Google Brain researchers guided by Turing Award honouree Dr. Geoffrey Hinton has presented SimCLRv2, an upgraded approach that boosts the SOTA results by 21.6 percent. The updated framework takes the "unsupervised pretrain, supervised fine-tune" paradigm popular in natural language processing and applies it to image recognition. Unlabelled data is learned in a task-agnostic way in the pretraining phase, which means the model has no prior classification knowledge. The researchers find that using a deep and wide neural network can be more label-efficient and greatly improve accuracy. Unlike SimCLR, whose largest model is ResNet-50, SimCLRv2's largest model is a 152-layer ResNet, which is three times wider in channels and selective kernels.


Pushing the Limit of Unsupervised Learning for Ultrasound Image Artifact Removal

arXiv.org Machine Learning

Ultrasound (US) imaging is a fast and non-invasive imaging modality which is widely used for real-time clinical imaging applications without concerning about radiation hazard. Unfortunately, it often suffers from poor visual quality from various origins, such as speckle noises, blurring, multi-line acquisition (MLA), limited RF channels, small number of view angles for the case of plane wave imaging, etc. Classical methods to deal with these problems include image-domain signal processing approaches using various adaptive filtering and model-based approaches. Recently, deep learning approaches have been successfully used for ultrasound imaging field. However, one of the limitations of these approaches is that paired high quality images for supervised training are difficult to obtain in many practical applications. In this paper, inspired by the recent theory of unsupervised learning using optimal transport driven cycleGAN (OT-cycleGAN), we investigate applicability of unsupervised deep learning for US artifact removal problems without matched reference data. Experimental results for various tasks such as deconvolution, speckle removal, limited data artifact removal, etc. confirmed that our unsupervised learning method provides comparable results to supervised learning for many practical applications.


Machine Learning - Redcrix Technologies (P) Ltd.

#artificialintelligence

Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values. The system is able to provide targets for any new input after sufficient training. The learning algorithm can also compare its output with the correct, intended output and find errors in order to modify the model accordingly. In contrast, unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled.


Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training

arXiv.org Machine Learning

Self-training is a classical approach in semi-supervised learning which is successfully applied to a variety of machine learning problems. Self-training algorithm generates pseudo-labels for the unlabeled examples and progressively refines these pseudo-labels which hopefully coincides with the actual labels. This work provides theoretical insights into self-training algorithm with a focus on linear classifiers. We first investigate Gaussian mixture models and provide a sharp non-asymptotic finite-sample characterization of the self-training iterations. Our analysis reveals the provable benefits of rejecting samples with low confidence and demonstrates that self-training iterations gracefully improve the model accuracy even if they do get stuck in sub-optimal fixed points. We then demonstrate that regularization and class margin (i.e. separation) is provably important for the success and lack of regularization may prevent self-training from identifying the core features in the data. Finally, we discuss statistical aspects of empirical risk minimization with self-training for general distributions. We show how a purely unsupervised notion of generalization based on self-training based clustering can be formalized based on cluster margin. We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data and weak supervision.