AITopics

Country: Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 07:58:49 GMT

dcd18e50ebca0af89187c6e35dabb584-Supplemental-Datasets_and_Benchmarks.pdf

Itisthereforenecessary to train the model to learn the characteristics that best encode the various aspects of aesthetics.

artificial intelligence, dataset, machine learning, (17 more...)

Country:

North America > Canada (0.04)
Asia > Vietnam > Long An Province (0.04)

Industry:

Media > Photography (0.47)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Neural Information Processing SystemsFeb-12-2026, 07:58:46 GMT

dcd18e50ebca0af89187c6e35dabb584-Paper-Datasets_and_Benchmarks.pdf

critique, dataset, sentiment score, (12 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)

Genre: Research Report (0.46)

Industry:

Media > Photography (0.69)
Information Technology > Services (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceDec-5-2025

Value Gradient Guidance for Flow Matching Alignment

Liu, Zhen, Xiao, Tim Z., Domingo-Enrich, Carles, Liu, Weiyang, Zhang, Dinghuai

While methods exist for aligning flow matching models--a popular and effective class of generative models--with human preferences, existing approaches fail to achieve both adaptation efficiency and probabilistically sound prior preservation. In this work, we leverage the theory of optimal control and propose VGG-Flow, a gradient-matching-based method for finetuning pretrained flow matching models. The key idea behind this algorithm is that the optimal difference between the finetuned velocity field and the pretrained one should be matched with the gradient field of a value function. This method not only incorporates first-order information from the reward model but also benefits from heuristic initialization of the value function to enable fast adaptation. Empirically, we show on a popular text-to-image flow matching model, Stable Diffusion 3, that our method can finetune flow matching models under limited computational budgets while achieving effective and prior-preserving alignment.

artificial intelligence, deep learning, machine learning, (18 more...)

2512.05116

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-3-2025

MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models

Zhai, Kevin, Singh, Utsav, Thatipelli, Anirudh, Chakraborty, Souradip, Sahu, Anit Kumar, Huang, Furong, Bedi, Amrit Singh, Shah, Mubarak

Diffusion models excel at generating images conditioned on text prompts, but the resulting images often do not satisfy user-specific criteria measured by scalar rewards such as Aesthetic Scores. This alignment typically requires fine-tuning, which is computationally demanding. Recently, inference-time alignment via noise optimization has emerged as an efficient alternative, modifying initial input noise to steer the diffusion denoising process towards generating high-reward images. However, this approach suffers from reward hacking, where the model produces images that score highly, yet deviate significantly from the original prompt. We show that noise-space regularization is insufficient and that preventing reward hacking requires an explicit image-space constraint. To this end, we propose MIRA (MItigating Reward hAcking), a training-free, inference-time alignment method. MIRA introduces an image-space, score-based KL surrogate that regularizes the sampling trajectory with a frozen backbone, constraining the output distribution so reward can increase without off-distribution drift (reward hacking). We derive a tractable approximation to KL using diffusion scores. Across SDv1.5 and SDXL, multiple rewards (Aesthetic, HPSv2, PickScore), and public datasets (e.g., Animal-Animal, HPDv2), MIRA achieves >60\% win rate vs. strong baselines while preserving prompt adherence; mechanism plots show reward gains with near-zero drift, whereas DNO drifts as compute increases. We further introduce MIRA-DPO, mapping preference optimization to inference time with a frozen backbone, extending MIRA to non-differentiable rewards without fine-tuning.

artificial intelligence, diffusion model, machine learning, (16 more...)

2510.01549

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

arXiv.org Artificial IntelligenceAug-29-2025

Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

Jin, Luozhijie, Qiu, Zijie, Liu, Jie, Diao, Zijie, Qiao, Lifeng, Ding, Ning, Lamb, Alex, Qiu, Xipeng

Denoising-based generative models, particularly diffusion and flow matching algorithms, have achieved remarkable success. However, aligning their output distributions with complex downstream objectives, such as human preferences, compositional accuracy, or data compressibility, remains challenging. While reinforcement learning (RL) fine-tuning methods, inspired by advances in RL from human feedback (RLHF) for large language models, have been adapted to these generative frameworks, current RL approaches are suboptimal for diffusion models and offer limited flexibility in controlling alignment strength after fine-tuning. In this work, we reinterpret RL fine-tuning for diffusion models through the lens of stochastic differential equations and implicit reward conditioning. We introduce Reinforcement Learning Guidance (RLG), an inference-time method that adapts Classifier-Free Guidance (CFG) by combining the outputs of the base and RL fine-tuned models via a geometric average. Our theoretical analysis shows that RLG's guidance scale is mathematically equivalent to adjusting the KL-regularization coefficient in standard RL objectives, enabling dynamic control over the alignment-quality trade-off without further training. Extensive experiments demonstrate that RLG consistently improves the performance of RL fine-tuned models across various architectures, RL algorithms, and downstream tasks, including human preferences, compositional control, compressibility, and text rendering. Furthermore, RLG supports both interpolation and extrapolation, thereby offering unprecedented flexibility in controlling generative alignment. Our approach provides a practical and theoretically sound solution for enhancing and controlling diffusion model alignment at inference. The source code for RLG is publicly available at the Github: https://github.com/jinluo12345/Reinforcement-learning-guidance.

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

2508.21016

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Neural Information Processing SystemsAug-19-2025, 10:39:55 GMT

dcd18e50ebca0af89187c6e35dabb584-Paper-Datasets_and_Benchmarks.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.46)

Industry:

Media > Photography (0.69)
Information Technology > Services (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Merchant, Nicholas, Borde, Haitz Sáez de Ocáriz, Popescu, Andrei Cristian, Suarez, Carlos Garcia Jurado

Structured Captions Improve Prompt Adherence in Text-to-Image Models (Re-LAION-Caption 19M)

arXiv.org Artificial IntelligenceJul-9-2025

We argue that generative text-to-image models often struggle with prompt adherence due to the noisy and unstructured nature of large-scale datasets like LAION-5B. This forces users to rely heavily on prompt engineering to elicit desirable outputs. In this work, we propose that enforcing a consistent caption structure during training can significantly improve model controllability and alignment. We introduce Re-LAION-Caption 19M, a high-quality subset of Re-LAION-5B, comprising 19 million 1024x1024 images with captions generated by a Mistral 7B Instruct-based LLaVA-Next model. Each caption follows a four-part template: subject, setting, aesthetics, and camera details. We fine-tune PixArt-$Σ$ and Stable Diffusion 2 using both structured and randomly shuffled captions, and show that structured versions consistently yield higher text-image alignment scores using visual question answering (VQA) models. The dataset is publicly available at https://huggingface.co/datasets/supermodelresearch/Re-LAION-Caption19M.

large language model, machine learning, natural language, (21 more...)

2507.053

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-19-2025

Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

Liu, Qingming, Liu, Zhen, Zhang, Dinghuai, Jia, Kui

Generating high-quality and photorealistic 3D assets remains a longstanding challenge in 3D vision and computer graphics. Although state-of-the-art generative models, such as diffusion models, have made significant progress in 3D generation, they often fall short of human-designed content due to limited ability to follow instructions, align with human preferences, or produce realistic textures, geometries, and physical attributes. In this paper, we introduce Nabla-R2D3, a highly effective and sample-efficient reinforcement learning alignment framework for 3D-native diffusion models using 2D rewards. Built upon the recently proposed Nabla-GFlowNet method, which matches the score function to reward gradients in a principled manner for reward finetuning, our Nabla-R2D3 enables effective adaptation of 3D diffusion models using only 2D reward signals. Extensive experiments show that, unlike vanilla finetuning baselines which either struggle to converge or suffer from reward hacking, Nabla-R2D3 consistently achieves higher rewards and reduced prior forgetting within a few finetuning steps.

diffusion model, machine learning, reinforcement learning, (17 more...)