image generator
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
We argue that diffusion models' success in modeling complex distributions is, for the most part, coming from their conditioning. This paper investigates the representation used to condition diffusion models from the perspective that ideal representations should improve modeling the data distribution, be easy to generate, and be compositional to allow generalizing outside the training distribution. We introduce Discrete Latent Code (DLC), an image representation derived from Simplicial Embeddings trained with a self-supervised learning objective. DLCs are sequences of discrete tokens, as opposed to the standard continuous image embeddings. They are easy to generate and their compositionality enables sampling of novel images beyond the training distribution. Diffusion models trained with DLCs improve generation fidelity, establishing a new state-of-the-art for unconditional image generation on ImageNet. Additionally, we show that composing DLCs allows the image generator to produce interesting out-of-distribution samples that coherently combine the semantics of images in diverse ways. Finally, we showcase how DLCs can enable text-to-image generation by leveraging large-scale pretrained language models. Using only 9M image-caption pairs, we efficiently finetune a text diffusion model to generate novel DLCs that produces samples outside of the data distribution used to train the image generator.
An AI image generator for non-English speakers
Although text-to-image generation is rapidly advancing, these AI models are mostly English-centric. Researchers at the University of Amsterdam Faculty of Science have created NeoBabel, an AI image generator that can work in six different languages. By making all elements of their research open source, anyone can build on the model and help push inclusive AI research. When you generate an image with AI, the results are often better when your prompt is in English. This is because many AI models are English at their core: if you use another language, your prompt is translated into English before the image is created.
Return of Unconditional Generation: A Self-supervised Representation Generation Method
Unconditional generation--the problem of modeling data distribution without relying on human-annotated labels--is a long-standing and fundamental challenge in generative models, creating a potential of learning from large-scale unlabeled data. In the literature, the generation quality of an unconditional method has been much worse than that of its conditional counterpart. This gap can be attributed to the lack of semantic information provided by labels. In this work, we show that one can close this gap by generating semantic representations in the representation space produced by a self-supervised encoder. These representations can be used to condition the image generator.
Meta-Reinforced Synthetic Data for One-Shot Fine-Grained Visual Recognition
This paper studies the task of one-shot fine-grained recognition, which suffers from the problem of data scarcity of novel fine-grained classes. To alleviate this problem, a off-the-shelf image generator can be applied to synthesize additional images to help one-shot learning. However, such synthesized images may not be helpful in one-shot fine-grained recognition, due to a large domain discrepancy between synthesized and original images. To this end, this paper proposes a meta-learning framework to reinforce the generated images by original images so that these images can facilitate one-shot learning. Specifically, the generic image generator is updated by few training instances of novel classes; and a Meta Image Reinforcing Network (MetaIRNet) is proposed to conduct one-shot fine-grained recognition as well as image reinforcement. The model is trained in an end-to-end manner, and our experiments demonstrate consistent improvement over baseline on one-shot fine-grained image classification benchmarks.
SynthPix: A lightspeed PIV images generator
Terpin, Antonio, Bonomi, Alan, Banelli, Francesco, D'Andrea, Raffaello
We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented in JAX. SynthPix supports the same configuration parameters as existing tools but achieves a throughput several orders of magnitude higher in image-pair generation per second. SynthPix was developed to enable the training of data-hungry reinforcement learning methods for flow estimation and for reducing the iteration times during the development of fast flow estimation methods used in recent active fluids control studies with real-time PIV feedback. We believe SynthPix to be useful for the fluid dynamics community, and in this paper we describe the main ideas behind this software package.
Image Generation as a Visual Planner for Robotic Manipulation
Generating realistic robotic manipulation videos is an important step toward unifying perception, planning, and action in embodied agents. While existing video diffusion models require large domain-specific datasets and struggle to generalize, recent image generation models trained on language-image corpora exhibit strong compositionality, including the ability to synthesize temporally coherent grid images. This suggests a latent capacity for video-like generation even without explicit temporal modeling. We explore whether such models can serve as visual planners for robots when lightly adapted using LoRA finetuning. We propose a two-part framework that includes: (1) text-conditioned generation, which uses a language instruction and the first frame, and (2) trajectory-conditioned generation, which uses a 2D trajectory overlay and the same initial frame. Experiments on the Jaco Play dataset, Bridge V2, and the RT1 dataset show that both modes produce smooth, coherent robot videos aligned with their respective conditions. Our findings indicate that pretrained image generators encode transferable temporal priors and can function as video-like robotic planners under minimal supervision. Code is released at \href{https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation}{https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation}.
Hands On With Google's Nano Banana Pro Image Generator
Google's latest AI image model is vastly better than the previous release at generating text in images. You can expect companies to go buck wild with this update. Nano Banana Pro generated this image, assembling a crowd of standalone characters into one scene. Corporate AI slop feels inescapable in 2025. From website banner ads to outdoor billboards, images generated by businesses using AI tools surround me.
Learning Hierarchical Semantic Image Manipulation through Structured Representations
Seunghoon Hong, Xinchen Yan, Thomas S. Huang, Honglak Lee
Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively.
Return of Unconditional Generation: A Self-supervised Representation Generation Method
Unconditional generation--the problem of modeling data distribution without relying on human-annotated labels--is a long-standing and fundamental challenge in generative models, creating a potential of learning from large-scale unlabeled data. In the literature, the generation quality of an unconditional method has been much worse than that of its conditional counterpart. This gap can be attributed to the lack of semantic information provided by labels. In this work, we show that one can close this gap by generating semantic representations in the representation space produced by a self-supervised encoder. These representations can be used to condition the image generator.