Goto

Collaborating Authors

 generate image



94c28dcfc97557df0df6d1f7222fc384-Paper.pdf

Neural Information Processing Systems

However, most of these models do not support the other crucial ability ofagenerativemodel: generating imaginary observations bylearning thedensity of theobserveddata. Although thisability toimagine according tothedensity ofthepossible worlds plays a crucial role, e.g., in world models required for planning and model-based reinforcement


Security News This Week: ICE Can Now Spy on Every Phone in Your Neighborhood

WIRED

Plus: Iran shuts down its internet amid sweeping protests, an alleged scam boss gets extradited to China, and more. After a federal agent shot and killed 37-year-old Renee Good in Minneapolis on Wednesday, WIRED surfaced December federal court testimony from the reported ICE shooter, Jonathan Ross. In it, he said he was a firearms trainer and that he has had "hundreds" of encounters with drivers in a professional capacity during enforcement actions. Separately, we looked at how the tactics behind protest policing are moving toward intentional antagonism . If you haven't seen it, here's our guide to protesting safely in the age of surveillance .


PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph

Neural Information Processing Systems

Despite some exciting progress on high-quality image generation from structured (scene graphs) or free-form (sentences) descriptions, most of them only guarantee the image-level semantical consistency, i.e. the generated image matching the semantic meaning of the description. They still lack the investigations on synthesizing the images in a more controllable way, like finely manipulating the visual appearance of every object. Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops. To enhance the interactions of the objects in the output, we design a Crop Refining Network and an Object-Image Fuser to embed the objects as well as their relationships into one map. Multiple losses work collaboratively to guarantee the generated images highly respecting the crops and complying with the scene graphs while maintaining excellent image quality. A crop selector is also proposed to pick the most-compatible crops from our external object tank by encoding the interactions around the objects in the scene graph if the crops are not provided. Evaluated on Visual Genome and COCO-Stuff dataset, our proposed method significantly outperforms the SOTA methods on Inception Score, Diversity Score and Fre chet Inception Distance. Extensive experiments also demonstrate our method's ability to generate complex and diverse images with given objects.



AI firm wins high court ruling after photo agency's copyright claim

The Guardian

Stability AI's model allows users to generate images with text prompts. Stability AI's model allows users to generate images with text prompts. There was evidence that Getty's images were used to train Stability's model, which allows users to generate images with text prompts. Stability was also found to have infringed Getty's trademarks in some cases. The judge, Mrs Justice Joanna Smith, said the question of where to strike the balance between the interests of the creative industries on one side and the AI industry on the other was "of very real societal importance".




Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models

Cheng, Min, Doudi, Fatemeh, Kalathil, Dileep, Ghavamzadeh, Mohammad, Kumar, Panganamala R.

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms have been used recently to align diffusion models with downstream objectives such as aesthetic quality and text-image consistency by fine-tuning them to maximize a single reward function under a fixed KL regularization. However, this approach is inherently restrictive in practice, where alignment must balance multiple, often conflicting objectives. Moreover, user preferences vary across prompts, individuals, and deployment contexts, with varying tolerances for deviation from a pre-trained base model. We address the problem of inference-time multi-preference alignment: given a set of basis reward functions and a reference KL regularization strength, can we design a fine-tuning procedure so that, at inference time, it can generate images aligned with any user-specified linear combination of rewards and regularization, without requiring additional fine-tuning? We propose Diffusion Blend, a novel approach to solve inference-time multi-preference alignment by blending backward diffusion processes associated with fine-tuned models, and we instantiate this approach with two algorithms: DB-MPA for multi-reward alignment and DB-KLA for KL regularization control. Extensive experiments show that Diffusion Blend algorithms consistently outperform relevant baselines and closely match or exceed the performance of individually fine-tuned models, enabling efficient, user-driven alignment at inference-time. The code is available at https://github.com/bluewoods127/DB-2025}{github.com/bluewoods127/DB-2025.


Responsible Diffusion Models via Constraining Text Embeddings within Safe Regions

Li, Zhiwen, Chen, Die, Fan, Mingyuan, Chen, Cen, Li, Yaliang, Wang, Yanhao, Zhou, Wenmeng

arXiv.org Artificial Intelligence

The remarkable ability of diffusion models to generate high-fidelity images has led to their widespread adoption. However, concerns have also arisen regarding their potential to produce Not Safe for Work (NSFW) content and exhibit social biases, hindering their practical use in real-world applications. In response to this challenge, prior work has focused on employing security filters to identify and exclude toxic text, or alternatively, fine-tuning pre-trained diffusion models to erase sensitive concepts. Unfortunately, existing methods struggle to achieve satisfactory performance in the sense that they can have a significant impact on the normal model output while still failing to prevent the generation of harmful content in some cases. In this paper, we propose a novel self-discovery approach to identifying a semantic direction vector in the embedding space to restrict text embedding within a safe region. Our method circumvents the need for correcting individual words within the input text and steers the entire text prompt towards a safe region in the embedding space, thereby enhancing model robustness against all possibly unsafe prompts. In addition, we employ Low-Rank Adaptation (LoRA) for semantic direction vector initialization to reduce the impact on the model performance for other semantics. Furthermore, our method can also be integrated with existing methods to improve their social responsibility. Extensive experiments on benchmark datasets demonstrate that our method can effectively reduce NSFW content and mitigate social bias generated by diffusion models compared to several state-of-the-art baselines.