Unsupervised or Indirectly Supervised Learning
Perceived Realism of High-Resolution Generative Adversarial Networkโderived Synthetic Mammograms
To explore whether generative adversarial networks (GANs) can enable synthesis of realistic medical images that are indiscernible from real images, even by domain experts. In this retrospective study, progressive growing GANs were used to synthesize mammograms at a resolution of 1280 1024 pixels by using images from 90 000 patients (average age, 56 years 9) collected between 2009 and 2019. To evaluate the results, a method to assess distributional alignment for ultraโhigh-dimensional pixel distributions was used, which was based on moment plots. This method was able to reveal potential sources of misalignment. A total of 117 volunteer participants (55 radiologists and 62 nonradiologists) took part in a study to assess the realism of synthetic images from GANs.
Unsupervised Learning: How Machines Learn on Their Own
This type of machine learning (ML) grants AI applications the ability to learn and find hidden patterns in large datasets without human supervision. Unsupervised learning is also crucial for achieving artificial general intelligence. Labeling data is labor-intensive and time-consuming, and in many cases, impractical. That's where unsupervised learning brings a big difference by granting AI applications the ability to learn without labels and supervision. Unsupervised learning (UL) is a machine learning technique used to identify patterns in datasets containing unclassified and unlabeled data points. In this learning method, an AI system is given only the input data and no corresponding output data.
Cycle-free CycleGAN using Invertible Generator for Unsupervised Low-Dose CT Denoising
Recently, CycleGAN was shown to provide high-performance, ultra-fast denoising for low-dose X-ray computed tomography (CT) without the need for a paired training dataset. Although this was possible thanks to cycle consistency, CycleGAN requires two generators and two discriminators to enforce cycle consistency, demanding significant GPU resources and technical skills for training. A recent proposal of tunable CycleGAN with Adaptive Instance Normalization (AdaIN) alleviates the problem in part by using a single generator. However, two discriminators and an additional AdaIN code generator are still required for training. To solve this problem, here we present a novel cycle-free Cycle-GAN architecture, which consists of a single generator and a discriminator but still guarantees cycle consistency. The main innovation comes from the observation that the use of an invertible generator automatically fulfills the cycle consistency condition and eliminates the additional discriminator in the CycleGAN formulation. To make the invertible generator more effective, our network is implemented in the wavelet residual domain. Extensive experiments using various levels of low-dose CT images confirm that our method can significantly improve denoising performance using only 10% of learnable parameters and faster training time compared to the conventional CycleGAN.
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization
Li, Daiqing, Yang, Junlin, Kreis, Karsten, Torralba, Antonio, Fidler, Sanja
Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. Concretely, we learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images supplemented with only few labeled ones. We build our architecture on top of StyleGAN2, augmented with a label synthesis branch. Image labeling at test time is achieved by first embedding the target image into the joint latent space via an encoder network and test-time optimization, and then generating the label from the inferred embedding. We evaluate our approach in two important domains: medical image segmentation and part-based face segmentation. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization, such as transferring from CT to MRI in medical imaging, and photographs of real faces to paintings, sculptures, and even cartoons and animal faces. Project Page: \url{https://nv-tlabs.github.io/semanticGAN/}
Self-Training with Weak Supervision
Karamanolakis, Giannis, Mukherjee, Subhabrata, Zheng, Guoqing, Awadallah, Ahmed Hassan
State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such settings to automatically generate weakly labeled training data. However, learning with weak rules is challenging due to their inherent heuristic and noisy nature. An additional challenge is rule coverage and overlap, where prior work on weak supervision only considers instances that are covered by weak rules, thus leaving valuable unlabeled data behind. In this work, we develop a weak supervision framework (ASTRA) that leverages all the available data for a given task. To this end, we leverage task-specific unlabeled data through self-training with a model (student) that considers contextualized representations and predicts pseudo-labels for instances that may not be covered by weak rules. We further develop a rule attention network (teacher) that learns how to aggregate student pseudo-labels with weak rule labels, conditioned on their fidelity and the underlying context of an instance. Finally, we construct a semi-supervised learning objective for end-to-end training with unlabeled data, domain-specific rules, and a small amount of labeled data. Extensive experiments on six benchmark datasets for text classification demonstrate the effectiveness of our approach with significant improvements over state-of-the-art baselines.
Generative Landmarks
We propose a general purpose approach to detect landmarks with improved temporal consistency, and personalization. Most sparse landmark detection methods rely on laborious, manually labelled landmarks, where inconsistency in annotations over a temporal volume leads to sub-optimal landmark learning. Further, high-quality landmarks with personalization is often hard to achieve. We pose landmark detection as an image translation problem. We capture two sets of unpaired marked (with paint) and unmarked videos. We then use a generative adversarial network and cyclic consistency to predict deformations of landmark templates that simulate markers on unmarked images until these images are indistinguishable from ground-truth marked images. Our novel method does not rely on manually labelled priors, is temporally consistent, and image class agnostic -- face, and hand landmarks detection examples are shown.
Relieving the Plateau: Active Semi-Supervised Learning for a Better Landscape
Kong, Seo Taek, Jeon, Soomin, Lee, Jaewon, Lee, Hongseok, Jung, Kyu-Hwan
Deep learning (DL) relies on massive amounts of labeled data, and improving its labeled sample-efficiency remains one of the most important problems since its advent. Semi-supervised learning (SSL) leverages unlabeled data that are more accessible than their labeled counterparts. Active learning (AL) selects unlabeled instances to be annotated by a human-in-the-loop in hopes of better performance with less labeled data. Given the accessible pool of unlabeled data in pool-based AL, it seems natural to use SSL when training and AL to update the labeled set; however, algorithms designed for their combination remain limited. In this work, we first prove that convergence of gradient descent on sufficiently wide ReLU networks can be expressed in terms of their Gram matrix' eigen-spectrum. Equipped with a few theoretical insights, we propose convergence rate control (CRC), an AL algorithm that selects unlabeled data to improve the problem conditioning upon inclusion to the labeled set, by formulating an acquisition step in terms of improving training dynamics. Extensive experiments show that SSL algorithms coupled with CRC can achieve high performance using very few labeled data.
Streaming Self-Training via Domain-Agnostic Unlabeled Images
Lin, Zhiqiu, Ramanan, Deva, Bansal, Aayush
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models such that a non-expert user can define a new task depending on their needs via a few labeled examples and minimal domain knowledge. Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates that iterates between pre-training on novel segments of the streams of unlabeled data, and fine-tuning on the small and fixed labeled dataset. This allows SST to overcome the need for a large number of domain-specific labeled and unlabeled examples, exorbitant computational resources, and domain/task-specific knowledge. In this setting, classical semi-supervised approaches require a large amount of domain-specific labeled and unlabeled examples, immense resources to process data, and expert knowledge of a particular task. Due to these reasons, semi-supervised learning has been restricted to a few places that can house required computational and human resources. In this work, we overcome these challenges and demonstrate our findings for a wide range of visual recognition tasks including fine-grained image classification, surface normal estimation, and semantic segmentation. We also demonstrate our findings for diverse domains including medical, satellite, and agricultural imagery, where there does not exist a large amount of labeled or unlabeled data.
ProteinGAN: A generative adversarial network that generates functional protein sequences
Proteins are large, highly complex and naturally occurring molecules can be found in all living organisms. These unique substances, which consist of amino acids joined together by peptide bonds to form long chains, can have a variety of functions and properties. The specific order in which different amino acids are arranged to form a given protein ultimately determines the protein's 3D structure, physicochemical properties and molecular function. While scientists have been studying proteins for decades, designing proteins that elicit specific chemical reactions has so far proved to be highly challenging. Researchers at Biomatter Designs, Vilnius University in Lithuania, and Chalmers University of Technology in Sweden have recently developed ProteinGAN, a generative adversarial network (GAN) that can process and'learn' different natural protein sequences.
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Xiong, Bo, Fan, Haoqi, Grauman, Kristen, Feichtenhofer, Christoph
We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video. The complementary views help obtain more reliable pseudo-labels on unlabeled video, to learn stronger video representations than from purely supervised data. Though our method capitalizes on multiple views, it nonetheless trains a model that is shared across appearance and motion input and thus, by design, incurs no additional computation overhead at inference time. On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.