The CLIP Model is Secretly an Image-to-Prompt Converter

Neural Information Processing Systems 

The Stable Diffusion model is a prominent text-to-image generation model that relies on a text prompt as its input, which is encoded using the Contrastive Language-Image Pre-Training (CLIP).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found