AITopics | drawbench

Collaborating Authors

drawbench

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PhotorealisticText-to-ImageDiffusionModels withDeepLanguageUnderstanding

Neural Information Processing SystemsFeb-12-2026, 16:42:32 GMT

While conceptually simple and easy to train, Imagen yields surprisingly strong results. Imagen outperforms other methods on COCO [38] with zero-shot FID-30K of 7.27, significantly outperforming prior work such asGLIDE [43](at 12.4) and the concurrent work ofDALL-E 2[56](at 10.4).

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Neural Information Processing SystemsDec-25-2025, 15:51:28 GMT

Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g., T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

language model, name change, photorealistic text-to-image diffusion model, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.59)

Add feedback

ec795aeadae0b7d230fa35cbaf04c041-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 16:58:46 GMT

diffusion model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports (0.46)
Law (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Neural Information Processing SystemsAug-12-2025, 23:18:31 GMT

Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g., T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

artificial intelligence, machine learning, photorealistic text-to-image diffusion model, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)

Add feedback

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Neural Information Processing SystemsJan-19-2025, 05:32:49 GMT

Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g., T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

image-text alignment, language model, photorealistic text-to-image diffusion model, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)

Add feedback

Paper Review: A Deep Dive into Imagen

#artificialintelligenceFeb-2-2023, 00:10:12 GMT

Investigating the first half of this claim, the authors present several qualitative comparisons between Imagen and DALL-E 2 generated images. They also provide results from human evaluation experiments where people were asked to choose the most photorealistic image from a single text prompt or caption. Even before considering any results, immediately the authors have introduced a degree of subjectivity into their analysis that is inherent in human evaluation experiments. Therefore the results shown in [1] must be considered with care and a healthy level of skepticism. To provide some context to these results, the authors select some example comparisons shown to human raters and include these in the Appendix (definitely take a look at these -- for motivation, I've added an example from DALL-E 2 above). However, even with these examples, I find it difficult to make a clear judgement over which image should be preferred. Considering the copied examples shown in the figure above, personally I believe that some of DALL-E 2's generated images are more photorealistic than Imagen's, which demonstrates the issues of subjectivity when collecting results such as these. The authors choose to ask human raters'which image is more photorealistic?'

diffusion model, drawbench, imagen, (16 more...)

#artificialintelligence

Genre:

Summary/Review (1.00)
Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Google's New Imagen AI Outperforms DALL-E on Text-to-Image Generation Benchmarks

#artificialintelligenceJun-15-2022, 07:07:58 GMT

Researchers from Google's Brain Team have announced Imagen, a text-to-image AI model that can generate photorealistic images of a scene given a textual description. Imagen outperforms DALL-E 2 on the COCO benchmark, and unlike many similar models, is pre-trained only on text data. The model and several experiments were described in a paper published on arXiv. Imagen uses a Transformer language model to convert the input text into a sequence of embedding vectors. A series of three diffusion models then convert the embeddings into a 1024x1024 pixel image.

diffusion model, imagen, text-to-image generation benchmark, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

turn any text to an image with google's latest AI tool 'imagen'

#artificialintelligenceMay-29-2022, 19:11:47 GMT

Basically, the system can create photorealistic images from input text. 'We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding,' says the official paper. 'Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation.' Google claims Imagen features an unprecedented degree of photorealism and a deep level of language understanding that surpasses its competitors. For it to work, the program takes texts -- let's say,'Three spheres made of glass falling into the ocean. The resulting images can be either photorealistic or more of an artistic interpretation.

imagen, latest ai tool, unprecedented degree, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback