Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


'Creative' Facial Verification with Generative Adversarial Networks

#artificialintelligence

A new paper from Stanford University has proposed a nascent method for fooling facial authentication systems in platforms such as dating apps, by using a Generative Adversarial Network (GAN) to create alternative face images that contain the same essential ID information as a real face. The method successfully bypassed facial verification processes on dating applications Tinder and Bumble, in one case even passing off a gender-swapped (male) face as authentic to the source (female) identity. Various generated identities which feature the specific encoding of the paper's author (featured in first image above). According to the author, the work represents the first attempt to bypass facial verification with the use of generated images that have been imbued with specific identity traits, but which attempt to represent an alternate or substantially altered identity. The technique was tested on a custom local face verification system, and then performed well in black box tests against two dating applications that perform facial verification on user-uploaded images.


Elbow detection for clustering using splines

#artificialintelligence

Among the methods offered by machine learning and artificial intelligence, clustering methods are among the most interesting. These methods belong to the class of unsupervised methods, and as such do not suffer from bias or presuppositions, since they do not seek to learn a known rule, but rather to identify unknown links. Their appeal, therefore, lies in their ability to make sense of data whose volume and/or cardinality exceed the processing capabilities of a human. There are in the field of artificial intelligence, two major classes of methods: the supervised approach and the unsupervised approach. They are distinguished by the form of the problem that is submitted to machine learning.


Machine Learning…

#artificialintelligence

Classical machine learning is often categorized by how an algorithm learns to become more accurate in its predictions. The algorithm scans through data sets looking for any meaningful connection. The data that algorithms train on as well as the predictions or recommendations they output are predetermined. Data scientists may feed an algorithm mostly labeled training data, but the model is free to explore the data on its own and develop its own understanding of the data set. Data scientists program an algorithm to complete a task and give it positive or negative cues as it works out how to complete a task.


Addressing Missing Sources with Adversarial Support-Matching

arXiv.org Machine Learning

When trained on diverse labeled data, machine learning models have proven themselves to be a powerful tool in all facets of society. However, due to budget limitations, deliberate or non-deliberate censorship, and other problems during data collection and curation, the labeled training set might exhibit a systematic shortage of data for certain groups. We investigate a scenario in which the absence of certain data is linked to the second level of a two-level hierarchy in the data. Inspired by the idea of protected groups from algorithmic fairness, we refer to the partitions carved by this second level as "subgroups"; we refer to combinations of subgroups and classes, or leaves of the hierarchy, as "sources". To characterize the problem, we introduce the concept of classes with incomplete subgroup support. The representational bias in the training set can give rise to spurious correlations between the classes and the subgroups which render standard classification models ungeneralizable to unseen sources. To overcome this bias, we make use of an additional, diverse but unlabeled dataset, called the "deployment set", to learn a representation that is invariant to subgroup. This is done by adversarially matching the support of the training and deployment sets in representation space. In order to learn the desired invariance, it is paramount that the sets of samples observed by the discriminator are balanced by class; this is easily achieved for the training set, but requires using semi-supervised clustering for the deployment set. We demonstrate the effectiveness of our method with experiments on several datasets and variants of the problem.


Meta's Yann LeCun strives for human-level AI

#artificialintelligence

Did you miss a session at the Data Summit? What is the next step toward bridging the gap between natural and artificial intelligence? Scientists and researchers are divided on the answer. Yann LeCun, Chief AI Scientist at Meta and the recipient of the 2018 Turing Award, is betting on self-supervised learning, machine learning models that can be trained without the need for human-labeled examples. LeCun has been thinking and talking about self-supervised and unsupervised learning for years.


Representation Learning via Consistent Assignment of Views to Clusters

arXiv.org Artificial Intelligence

We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce different views of a given image to be assigned to the same prototype. Unlike contemporary work on contrastive learning with deep clustering, CARL proposes to learn the set of general prototypes in an online fashion, using gradient descent without the necessity of using non-differentiable algorithms or K-Means to solve the cluster assignment problem. CARL surpasses its competitors in many representations learning benchmarks, including linear evaluation, semi-supervised learning, and transfer learning.


Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling

arXiv.org Artificial Intelligence

Speech Emotion Recognition (SER) application is frequently associated with privacy concerns as it often acquires and transmits speech data at the client-side to remote cloud platforms for further processing. These speech data can reveal not only speech content and affective information but the speaker's identity, demographic traits, and health status. Federated learning (FL) is a distributed machine learning algorithm that coordinates clients to train a model collaboratively without sharing local data. This algorithm shows enormous potential for SER applications as sharing raw speech or speech features from a user's device is vulnerable to privacy attacks. However, a major challenge in FL is limited availability of high-quality labeled data samples. In this work, we propose a semi-supervised federated learning framework, Semi-FedSER, that utilizes both labeled and unlabeled data samples to address the challenge of limited labeled data samples in FL. We show that our Semi-FedSER can generate desired SER performance even when the local label rate l=20 using two SER benchmark datasets: IEMOCAP and MSP-Improv.


S5CL: Unifying Fully-Supervised, Self-Supervised, and Semi-Supervised Learning Through Hierarchical Contrastive Learning

arXiv.org Machine Learning

In computational pathology, we often face a scarcity of annotations and a large amount of unlabeled data. One method for dealing with this is semi-supervised learning which is commonly split into a self-supervised pretext task and a subsequent model fine-tuning. Here, we compress this two-stage training into one by introducing S5CL, a unified framework for fully-supervised, self-supervised, and semi-supervised learning. With three contrastive losses defined for labeled, unlabeled, and pseudo-labeled images, S5CL can learn feature representations that reflect the hierarchy of distance relationships: similar images and augmentations are embedded the closest, followed by different looking images of the same class, while images from separate classes have the largest distance. Moreover, S5CL allows us to flexibly combine these losses to adapt to different scenarios. Evaluations of our framework on two public histopathological datasets show strong improvements in the case of sparse labels: for a H&E-stained colorectal cancer dataset, the accuracy increases by up to 9% compared to supervised cross-entropy loss; for a highly imbalanced dataset of single white blood cells from leukemia patient blood smears, the F1-score increases by up to 6%.


Semi-supervised Learning on Large Graphs: is Poisson Learning a Game-Changer?

arXiv.org Machine Learning

We explain Poisson learning on graph-based semi-supervised learning to see if it could avoid the problem of global information loss problem as Laplace-based learning methods on large graphs. From our analysis, Poisson learning is simply Laplace regularization with thresholding, cannot overcome the problem.


An Information-theoretical Approach to Semi-supervised Learning under Covariate-shift

arXiv.org Machine Learning

A common assumption in semi-supervised learning is that the labeled, unlabeled, and test data are drawn from the same distribution. However, this assumption is not satisfied in many applications. In many scenarios, the data is collected sequentially (e.g., healthcare) and the distribution of the data may change over time often exhibiting so-called covariate shifts. In this paper, we propose an approach for semi-supervised learning algorithms that is capable of addressing this issue. Our framework also recovers some popular methods, including entropy minimization and pseudo-labeling. We provide new information-theoretical based generalization error upper bounds inspired by our novel framework. Our bounds are applicable to both general semi-supervised learning and the covariate-shift scenario. Finally, we show numerically that our method outperforms previous approaches proposed for semi-supervised learning under the covariate shift.