AITopics | sd-1

Collaborating Authors

sd-1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Color Conditional Generation with Sliced Wasserstein Guidance

Neural Information Processing SystemsJun-23-2026, 02:14:09 GMT

We propose SW-Guidance, a training-free approach for image generation conditioned on the color distribution of a reference image. While it is possible to generate an image with fixed colors by first creating an image from a text prompt and then applying a color style transfer method, this approach often results in semantically meaningless colors in the generated image. Our method solves this problem by modifying the sampling process of a diffusion model to incorporate the differentiable Sliced 1-Wasserstein distance between the color distribution of the generated image and the reference palette. Our method outperforms state-ofthe-art techniques for color-conditional generation in terms of color similarity to the reference, producing images that not only match the reference colors but also maintain semantic coherence with the original text prompt.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

860c1c657deafe09f64c013c2888bd7b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 08:39:07 GMT

diffusion model, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Delta Sampling: Data-Free Knowledge Transfer Across Diffusion Models

Gao, Zhidong, Pan, Zimeng, Yao, Yuhang, Xie, Chenyue, Wei, Wei

arXiv.org Artificial IntelligenceDec-4-2025

Diffusion models like Stable Diffusion (SD) drive a vibrant open-source ecosystem including fully fine-tuned checkpoints and parameter-efficient adapters such as LoRA, LyCORIS, and ControlNet. However, these adaptation components are tightly coupled to a specific base model, making them difficult to reuse when the base model is upgraded (e.g., from SD 1.x to 2.x) due to substantial changes in model parameters and architecture. In this work, we propose Delta Sampling (DS), a novel method that enables knowledge transfer across base models with different architectures, without requiring access to the original training data. DS operates entirely at inference time by leveraging the delta: the difference in model predictions before and after the adaptation of a base model. This delta is then used to guide the denoising process of a new base model. We evaluate DS across various SD versions, demonstrating that DS achieves consistent improvements in creating desired effects (e.g., visual styles, semantic concepts, and structures) under different sampling strategies. These results highlight DS as an effective, plug-and-play mechanism for knowledge transfer in diffusion-based image synthesis. Code:~ https://github.com/Zhidong-Gao/DeltaSampling

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.03056

Country: North America > Mexico (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Neural Information Processing SystemsOct-10-2025, 08:22:40 GMT

By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

diffusion model, diffusion-dpo, spin-diffusion, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation

Chang, Yu, Chen, Jiahao, Cheng, Anzhe, Bogdan, Paul

arXiv.org Artificial IntelligenceSep-22-2025

Text-to-image diffusion models achieve impressive realism but often suffer from compositional failures on prompts with multiple objects, attributes, and spatial relations, resulting in cross-token interference where entities entangle, attributes mix across objects, and spatial cues are violated. To address these failures, we propose MaskAttn-SDXL,a region-level gating mechanism applied to the cross-attention logits of Stable Diffusion XL(SDXL)'s UNet. MaskAttn-SDXL learns a binary mask per layer, injecting it into each cross-attention logit map before softmax to sparsify token-to-latent interactions so that only semantically relevant connections remain active. The method requires no positional encodings, auxiliary tokens, or external region masks, and preserves the original inference path with negligible overhead. In practice, our model improves spatial compliance and attribute binding in multi-object prompts while preserving overall image quality and diversity. These findings demonstrate that logit-level maksed cross-attention is an data-efficient primitve for enforcing compositional control, and our method thus serves as a practical extension for spatial control in text-to-image generation.

artificial intelligence, machine learning, maskattn-sdxl, (15 more...)

arXiv.org Artificial Intelligence

2509.15357

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution

Wu, Yixin, Shen, Yun, Backes, Michael, Zhang, Yang

arXiv.org Artificial IntelligenceAug-30-2024

Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve existing concerns and whether they raise new questions remain unexplored. This study takes an initial step in investigating the evolution of text-to-image models from the perspectives of safety, bias, and authenticity. Our findings, centered on Stable Diffusion, indicate that model updates paint a mixed picture. While updates progressively reduce the generation of unsafe images, the bias issue, particularly in gender, intensifies. We also find that negative stereotypes either persist within the same Non-White race group or shift towards other Non-White race groups through SD updates, yet with minimal association of these traits with the White race group. Additionally, our evaluation reveals a new concern stemming from SD updates: State-of-the-art fake image detectors, initially trained for earlier SD versions, struggle to identify fake images generated by updated versions. We show that fine-tuning these detectors on fake images generated by updated versions achieves at least 96.6\% accuracy across various SD versions, addressing this issue. Our insights highlight the importance of continued efforts to mitigate biases and vulnerabilities in evolving text-to-image models.

sd version, sd-1, sd-2, (17 more...)

arXiv.org Artificial Intelligence

2408.17285

Country:

Africa (0.14)
North America > United States > Virginia (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Yuan, Huizhuo, Chen, Zixiang, Ji, Kaixuan, Gu, Quanquan

arXiv.org Artificial IntelligenceFeb-15-2024

Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

diffusion model, diffusion-dpo, spin-diffusion, (15 more...)

arXiv.org Artificial Intelligence

2402.1021

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback