CLIP-GEN Overview

May-9-2022, 23:10:06 GMT–#artificialintelligence

Training a text-to-image generator in the general domain like DALL-E, GauGAN, and CogView requires huge amounts of paired text-image data, which can be problematic and expensive. In this paper, the authors propose a self-supervised scheme named CLIP-GEN for general text-to-image generation with the language-image priors extracted with a pre-trained CLIP model. Only a set of unlabeled images in the general domain is required to train a text-to-image generator. First, the embedding of the image in the united language-vision embedding space is extracted with the CLIP encoder. Next, the image is converted into a sequence of discrete tokens in the VQGAN codebook space (the VQGAN can be trained using unlabeled data).

artificial intelligence, machine learning, transformer, (18 more...)

#artificialintelligence

May-9-2022, 23:10:06 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found