GitHub - lucidrains/imagen-pytorch: Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
It is the new SOTA for text-to-image synthesis. Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pretrained T5 model (attention network). It also contains dynamic clipping for improved classifier free guidance, noise level conditioning, and a memory efficient unet design. It appears neither CLIP nor prior network is needed after all.
May-27-2022, 21:28:07 GMT
- Technology: