AITopics | exp null 1

Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize well across different objectives. In this work, we propose a novel alignment framework by leveraging the underlying nature of the alignment problem -- sampling from reward-weighted distributions -- and show that it applies to both diffusion models (via score guidance) and flow matching models (via velocity guidance). The score function (velocity field) required for the reward-weighted distribution can be decomposed into the pre-trained score (velocity field) plus a conditional expectation of the reward. For the alignment on the diffusion model, we identify a fundamental challenge: the adversarial nature of the guidance term can introduce undesirable artifacts in the generated images. Therefore, we propose a finetuning-free framework that trains a guidance network to estimate the conditional expectation of the reward. We achieve comparable performance to finetuning-based models with one-step generation with at least a 60% reduction in computational cost. For the alignment on flow matching, we propose a training-free framework that improves the generation quality without additional computational cost.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Machine Learning

2602.00413

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Industry: Media > Photography (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Test-driven Reinforcement Learning in Continuous Control

Yu, Zhao, Wu, Xiuping, Ke, Liangjun

arXiv.org Artificial IntelligenceDec-10-2025

Reinforcement learning (RL) has been recognized as a powerful tool for robot control tasks. RL typically employs reward functions to define task objectives and guide agent learning. However, since the reward function serves the dual purpose of defining the optimal goal and guiding learning, it is challenging to design the reward function manually, which often results in a suboptimal task representation. To tackle the reward design challenge in RL, inspired by the satisficing theory, we propose a Test-driven Reinforcement Learning (TdRL) framework. In the TdRL framework, multiple test functions are used to represent the task objective rather than a single reward function. Test functions can be categorized as pass-fail tests and indicative tests, each dedicated to defining the optimal objective and guiding the learning process, respectively, thereby making defining tasks easier. Building upon such a task definition, we first prove that if a trajectory return function assigns higher returns to trajectories closer to the optimal trajectory set, maximum entropy policy optimization based on this return function will yield a policy that is closer to the optimal policy set. Then, we introduce a lexicographic heuristic approach to compare the relative distance relationship between trajectories and the optimal trajectory set for learning the trajectory return function. Furthermore, we develop an algorithm implementation of TdRL. Experimental results on the DeepMind Control Suite benchmark demonstrate that TdRL matches or outperforms handcrafted reward methods in policy training, with greater design simplicity and inherent support for multi-objective optimization. We argue that TdRL offers a novel perspective for representing task objectives, which could be helpful in addressing the reward design challenges in RL applications.

machine learning, reinforcement learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2511.07904

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions

Matrenok, Simon, Moalla, Skander, Gulcehre, Caglar

arXiv.org Artificial IntelligenceDec-2-2025

Aligning large language models with pointwise absolute rewards has so far required online, on-policy algorithms such as PPO and GRPO. In contrast, simpler methods that can leverage offline or off-policy data, such as DPO and REBEL, are limited to learning from preference pairs or relative signals. To bridge this gap, we introduce Quantile Reward Policy Optimization (QRPO), which learns from pointwise absolute rewards while preserving the simplicity and offline applicability of DPO-like methods. QRPO uses quantile rewards to enable regression to the closed-form solution of the KL-regularized RL objective. This reward yields an analytically tractable partition function, removing the need for relative signals to cancel this term. Moreover, QRPO scales with increased compute to estimate quantile rewards, opening a new dimension for pre-computation scaling. Empirically, QRPO consistently achieves top performance on chat and coding evaluations--reward model scores, AlpacaEval 2, and LeetCode--compared to DPO, REBEL, and SimPO across diverse datasets and 8B-scale models. Finally, we find that training with robust rewards instead of converting them to preferences induces less length bias.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.08068

Country: North America > United States (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Banking & Finance (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Score-based Generative Neural Networks for Large-Scale Optimal Transport

Daniels, Mara, Maunu, Tyler, Hand, Paul

arXiv.org Artificial IntelligenceOct-28-2025

We consider the fundamental problem of sampling the optimal transport coupling between given source and target distributions. In certain cases, the optimal transport plan takes the form of a one-to-one mapping from the source support to the target support, but learning or even approximating such a map is computationally challenging for large and high-dimensional datasets due to the high cost of linear programming routines and an intrinsic curse of dimensionality. We study instead the Sinkhorn problem, a regularized form of optimal transport whose solutions are couplings between the source and the target distribution. We introduce a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model. Conditioned on source data, our procedure iterates Langevin Dynamics to sample target data according to the regularized optimal coupling. Key to this approach is a neural network parametrization of the Sinkhorn problem, and we prove convergence of gradient descent with respect to network parameters in this formulation. We demonstrate its empirical success on a variety of large scale optimal transport tasks.

artificial intelligence, machine learning, optimal transport, (18 more...)

arXiv.org Artificial Intelligence

2110.03237

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Filters

Collaborating Authors

exp null 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4b3cc0d1c897ebcf71aca92a4a26ac83-Supplemental-Conference.pdf

Stepwise Alignment for Constrained Language Model Policy Optimization Akifumi Wachi Thien Q. Tran Rei Sato Takumi Tanabe Y ouhei Akimoto L Y Corporation University of Tsukuba

e5b6eb1dbabff82838d5e99f62de37c8-Paper-Conference.pdf

c2e4cebba2fdb3dac7d2022421062765-Supplemental-Conference.pdf

da75d2bbf862b86f10241d0887613b41-Supplemental-Conference.pdf

8c263f70550cc7d69dba3fc170a23e77-Supplemental-Conference.pdf

Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

Test-driven Reinforcement Learning in Continuous Control

Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions

Score-based Generative Neural Networks for Large-Scale Optimal Transport