AITopics

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsFeb-14-2026, 10:55:35 GMT

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users'

artificial intelligence, machine learning, pickscore, (18 more...)

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Industry: Information Technology (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Neural Information Processing SystemsDec-26-2025, 01:43:37 GMT

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images. We leverage this dataset to train a CLIP-based scoring function, PickScore, which exhibits superhuman performance on the task of predicting human preferences. Then, we test PickScore's ability to perform model evaluation and observe that it correlates better with human rankings than other automatic evaluation metrics. Therefore, we recommend using PickScore for evaluating future text-to-image generation models, and using Pick-a-Pic prompts as a more relevant dataset than MS-COCO. Finally, we demonstrate how PickScore can enhance existing text-to-image models via ranking.

open dataset, pick-a-pic, user preference, (7 more...)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Neural Information Processing SystemsOct-10-2025, 08:22:40 GMT

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data.

diffusion model, diffusion-dpo, spin-diffusion, (15 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Neural Information Processing SystemsOct-8-2025, 22:01:06 GMT

73aacd8b3b05b4b503d58310b523553c-Paper-Conference.pdf

machine learning, natural language, pickscore, (20 more...)

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

arXiv.org Artificial IntelligenceSep-30-2025

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models

Xue, Shuchen, Ge, Chongjian, Zhang, Shilong, Li, Yichen, Ma, Zhi-Ming

Reinforcement Learning (RL) has emerged as a central paradigm for advancing Large Language Models (LLMs), where pre-training and RL post-training share the same log-likelihood formulation. In contrast, recent RL approaches for diffusion models, most notably Denoising Diffusion Policy Optimization (DDPO), optimize an objective different from the pretraining objectives--score/flow matching loss. In this work, we establish a novel theoretical analysis: DDPO is an implicit form of score/flow matching with noisy targets, which increases variance and slows convergence. Building on this analysis, we introduce \textbf{Advantage Weighted Matching (AWM)}, a policy-gradient method for diffusion. It uses the same score/flow-matching loss as pretraining to obtain a lower-variance objective and reweights each sample by its advantage. In effect, AWM raises the influence of high-reward samples and suppresses low-reward ones while keeping the modeling objective identical to pretraining. This unifies pretraining and RL conceptually and practically, is consistent with policy-gradient theory, reduces variance, and yields faster convergence. This simple yet effective design yields substantial benefits: on GenEval, OCR, and PickScore benchmarks, AWM delivers up to a $24\times$ speedup over Flow-GRPO (which builds on DDPO), when applied to Stable Diffusion 3.5 Medium and FLUX, without compromising generation quality. Code is available at https://github.com/scxue/advantage_weighted_matching.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2509.2505

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceJul-30-2025

Sem-DPO: Mitigating Semantic Inconsistency in Preference Optimization for Prompt Engineering

Mohamed, Anas, Khan, Azal Ahmad, Wang, Xinran, Khan, Ahmad Faraz, Ge, Shuwen, Khan, Saman Bahzad, Ahmad, Ayaan, Anwar, Ali

Generative AI can now synthesize strikingly realistic images from text, yet output quality remains highly sensitive to how prompts are phrased. Direct Preference Optimization (DPO) offers a lightweight, off-policy alternative to RL for automatic prompt engineering, but its token-level regularization leaves semantic inconsistency unchecked as prompts that win higher preference scores can still drift away from the user's intended meaning. We introduce Sem-DPO, a variant of DPO that preserves semantic consistency yet retains its simplicity and efficiency. Sem-DPO adjusts the DPO loss using a weight based on how different the winning prompt is from the original, reducing the impact of training examples that are semantically misaligned. We provide the first analytical bound on semantic drift for preference-tuned prompt generators, showing that Sem-DPO keeps learned prompts within a provably bounded neighborhood of the original text. On three standard text-to-image prompt-optimization benchmarks and two language models, Sem-DPO achieves 8-12% higher CLIP similarity and 5-9% higher human-preference scores (HPSv2.1, PickScore) than DPO, while also outperforming state-of-the-art baselines. These findings suggest that strong flat baselines augmented with semantic weighting should become the new standard for prompt-optimization studies and lay the groundwork for broader, semantics-aware preference optimization in language models.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2507.20133

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Tamboli, Dipesh, Chakraborty, Souradip, Malusare, Aditya, Banerjee, Biplab, Bedi, Amrit Singh, Aggarwal, Vaneet

BalancedDPO: Adaptive Multi-Metric Alignment

arXiv.org Artificial IntelligenceMar-16-2025

Text-to-image (T2I) diffusion models have made remarkable advancements, yet aligning them with diverse preferences remains a persistent challenge. Current methods often optimize single metrics or depend on narrowly curated datasets, leading to overfitting and limited generalization across key visual quality metrics. We present BalancedDPO, a novel extension of Direct Preference Optimization (DPO) that addresses these limitations by simultaneously aligning T2I diffusion models with multiple metrics, including human preference, CLIP score, and aesthetic quality. Our key novelty lies in aggregating consensus labels from diverse metrics in the preference distribution space as compared to existing reward mixing approaches, enabling robust and scalable multi-metric alignment while maintaining the simplicity of the standard DPO pipeline that we refer to as BalancedDPO. Our evaluations on the Pick-a-Pic, PartiPrompt and HPD datasets show that BalancedDPO achieves state-of-the-art results, outperforming existing approaches across all major metrics. BalancedDPO improves the average win rates by 15%, 7.1%, and 10.3% on Pick-a-pic, PartiPrompt and HPD, respectively, from the DiffusionDPO.

artificial intelligence, diffusiondpo, machine learning, (16 more...)

2503.12575

Country: North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation (0.46)
Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Neural Information Processing SystemsJan-19-2025, 06:54:02 GMT

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' preferences over generated images. We leverage this dataset to train a CLIP-based scoring function, PickScore, which exhibits superhuman performance on the task of predicting human preferences. Then, we test PickScore's ability to perform model evaluation and observe that it correlates better with human rankings than other automatic evaluation metrics.

pick-a-pic, pickscore, text-to-image generation, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)

arXiv.org Artificial IntelligenceJan-10-2025

Alignment without Over-optimization: Training-Free Solution for Diffusion Models

Kim, Sunwoo, Kim, Minkyu, Park, Dongmin

Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free sampling method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS .

artificial intelligence, das, machine learning, (17 more...)