AITopics | Tang, Wenpin

Collaborating Authors

Tang, Wenpin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

Zhao, Hanyang, Chen, Haoxian, Guo, Yucheng, Winata, Genta Indra, Ou, Tingting, Huang, Ziyu, Yao, David D., Tang, Wenpin

arXiv.org Artificial IntelligenceMar-13-2025

We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images to extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1172

Genre: Research Report (0.64)

Industry: Media (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

Zhao, Hanyang, Chen, Haoxian, Zhang, Ji, Yao, David D., Tang, Wenpin

arXiv.org Artificial IntelligenceFeb-3-2025

Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models. Most works in this area use a discrete-time formulation, which is prone to induced errors, and often not applicable to models with higher-order/black-box solvers. The objective of this study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL, formulated as a stochastic control problem with a reward function that aligns the end result (terminal state) with input prompt. The key idea is to treat score matching as controls or actions, and thereby making connections to policy optimization and regularization in continuous-time RL. To carry out this idea, we lay out a new policy optimization framework for continuous-time RL, and illustrate its potential in enhancing the value networks design space via leveraging the structural property of diffusion models. We validate the advantages of our method by experiments in downstream tasks of fine-tuning large-scale Text2Image models of Stable Diffusion v1.5.

diffusion model, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2502.01819

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Regret of exploratory policy improvement and $q$-learning

Tang, Wenpin, Zhou, Xun Yu

arXiv.org Artificial IntelligenceNov-2-2024

We study the convergence of $q$-learning and related algorithms introduced by Jia and Zhou (J. Mach. Learn. Res., 24 (2023), 161) for controlled diffusion processes. Under suitable conditions on the growth and regularity of the model parameters, we provide a quantitative error and regret analysis of both the exploratory policy improvement algorithm and the $q$-learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2411.01302

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Zhao, Hanyang, Winata, Genta Indra, Das, Anirban, Zhang, Shi-Xiong, Yao, David D., Tang, Wenpin, Sahu, Sambit

arXiv.org Artificial IntelligenceOct-5-2024

Recently, numerous preference optimization algorithms have been introduced as extensions to the Direct Preference Optimization (DPO) family. While these methods have successfully aligned models with human preferences, there is a lack of understanding regarding the contributions of their additional components. Moreover, fair and consistent comparisons are scarce, making it difficult to discern which components genuinely enhance downstream performance. In this work, we propose RainbowPO, a unified framework that demystifies the effectiveness of existing DPO methods by categorizing their key components into seven broad directions. We integrate these components into a single cohesive objective, enhancing the performance of each individual element. Through extensive experiments, we demonstrate that RainbowPO outperforms existing DPO variants. Additionally, we provide insights to guide researchers in developing new DPO methods and assist practitioners in their implementations.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.04203

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

Chen, Haoxian, Zhao, Hanyang, Lam, Henry, Yao, David, Tang, Wenpin

arXiv.org Machine LearningMay-23-2024

Direct Preference Optimization (DPO) has recently emerged as a popular approach to improve reinforcement learning with human feedback (RLHF), leading to better techniques to fine-tune large language models (LLM). A weakness of DPO, however, lies in its lack of capability to characterize the diversity of human preferences. Inspired by Mallows' theory of preference ranking, we develop in this paper a new approach, the Mallows-DPO. A distinct feature of this approach is a dispersion index, which reflects the dispersion of human preference to prompts. We show that existing DPO models can be reduced to special cases of this dispersion index, thus unified with Mallows-DPO. More importantly, we demonstrate (empirically) how to use this dispersion index to enhance the performance of DPO in a broad array of benchmark tasks, from synthetic bandit selection to controllable generations and dialogues, while maintaining great generalization capabilities.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2405.14953

Country: North America > United States (0.28)

Genre:

Instructional Material > Course Syllabus & Notes (0.68)
Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond

Tang, Wenpin

arXiv.org Artificial IntelligenceMar-12-2024

This paper aims to develop and provide a rigorous treatment to the problem of entropy regularized fine-tuning in the context of continuous-time diffusion models, which was recently proposed by Uehara et al. (arXiv:2402.15194, 2024). The idea is to use stochastic control for sample generation, where the entropy regularizer is introduced to mitigate reward collapse. We also show how the analysis can be extended to fine-tuning involving a general $f$-divergence regularizer.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.06279

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Score-based Diffusion Models via Stochastic Differential Equations -- a Technical Tutorial

Tang, Wenpin, Zhao, Hanyang

arXiv.org Artificial IntelligenceFeb-12-2024

This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency model, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily for introducing the beginners to the field, and practitioners may also find some analysis useful in designing new models or algorithms.

artificial intelligence, diffusion model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.07487

Country: North America > United States (0.14)

Genre: Instructional Material (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Contractive Diffusion Probabilistic Models

Tang, Wenpin, Zhao, Hanyang

arXiv.org Artificial IntelligenceJan-23-2024

Diffusion probabilistic models (DPMs) have emerged as a promising technology in generative modeling. The success of DPMs relies on two ingredients: time reversal of Markov diffusion processes and score matching. Most existing work implicitly assumes that score matching is close to perfect, while this assumption is questionable. In view of possibly unguaranteed score matching, we propose a new criterion -- the contraction of backward sampling in the design of DPMs. This leads to a novel class of contractive DPMs (CDPMs), including contractive Ornstein-Uhlenbeck (OU) processes and contractive sub-variance preserving (sub-VP) stochastic differential equations (SDEs). The key insight is that the contraction in the backward process narrows score matching errors, as well as discretization error. Thus, the proposed CDPMs are robust to both sources of error. Our proposal is supported by theoretical results, and is corroborated by experiments. Notably, contractive sub-VP shows the best performance among all known SDE-based DPMs on the CIFAR-10 dataset.

artificial intelligence, machine learning, survey article, (18 more...)

arXiv.org Artificial Intelligence

2401.13115

Country:

North America > United States (0.14)
Europe (0.14)

Genre:

Research Report (0.82)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.84)

Add feedback

Policy Optimization for Continuous Reinforcement Learning

Zhao, Hanyang, Tang, Wenpin, Yao, David D.

arXiv.org Artificial IntelligenceOct-18-2023

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in the continuous approach to RL, we develop a notion of occupation time (specifically for a discounted objective), and show how it can be effectively used to derive performance-difference and local-approximation formulas. We further extend these results to illustrate their applications in the PG (policy gradient) and TRPO/PPO (trust region policy optimization/ proximal policy optimization) methods, which have been familiar and powerful tools in the discrete RL setting but under-developed in continuous RL. Through numerical experiments, we demonstrate the effectiveness and advantages of our approach.

artificial intelligence, continuous reinforcement learning, policy optimization

arXiv.org Artificial Intelligence

2305.18901

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Simulated annealing from continuum to discretization: a convergence analysis via the Eyring--Kramers law

Tang, Wenpin, Zhou, Xun Yu

arXiv.org Machine LearningFeb-9-2021

We study the convergence rate of continuous-time simulated annealing $(X_t; \, t \ge 0)$ and its discretization $(x_k; \, k =0,1, \ldots)$ for approximating the global optimum of a given function $f$. We prove that the tail probability $\mathbb{P}(f(X_t) > \min f +\delta)$ (resp. $\mathbb{P}(f(x_k) > \min f +\delta)$) decays polynomial in time (resp. in cumulative step size), and provide an explicit rate as a function of the model parameters. Our argument applies the recent development on functional inequalities for the Gibbs measure at low temperatures -- the Eyring-Kramers law. In the discrete setting, we obtain a condition on the step size to ensure the convergence.

artificial intelligence, inequality, survey article, (18 more...)

arXiv.org Machine Learning

2102.02339

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback