AITopics | Fan, Ying

Collaborating Authors

Fan, Ying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data

Zeng, Thomas, Zhang, Shuibai, Wu, Shutong, Classen, Christian, Chae, Daewon, Ewer, Ethan, Lee, Minjae, Kim, Heeju, Kang, Wonjun, Kunde, Jackson, Fan, Ying, Kim, Jungtaek, Koo, Hyung Il, Ramchandran, Kannan, Papailiopoulos, Dimitris, Lee, Kangwook

arXiv.org Artificial IntelligenceFeb-10-2025

In particular, Outcome Reward Models (ORMs) are Process Reward Models (PRMs) have proven used to provide supervision based solely on the correctness effective at enhancing mathematical reasoning of the final outcome. However, ORMs fail to address errors for Large Language Models (LLMs) by leveraging in intermediate steps, limiting their effectiveness for increased inference-time computation. However, complex, multi-step reasoning tasks (Luo et al., 2024; Lightman they are predominantly trained on mathematical et al., 2024; Sun et al., 2024). Because ORMs suffer data and their generalizability to nonmathematical from this limitation, Process Reward Models (PRMs) have domains has not been rigorously been proposed to offer fine-grained, step-by-step feedback studied. In response, this work first shows that on the correctness of each reasoning step (Lightman et al., current PRMs have poor performance in other 2024; Uesato et al., 2022). PRMs have proven highly effective domains. To address this limitation, we introduce during inference, improving the reranking of generated VersaPRM, a multi-domain PRM trained solutions and guiding LLMs through search-based on synthetic reasoning data generated using our algorithms (Wan et al., 2024; Wang et al., 2024a).

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.06737

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Looped Transformers for Length Generalization

Fan, Ying, Du, Yilun, Ramchandran, Kannan, Lee, Kangwook

arXiv.org Artificial IntelligenceSep-25-2024

Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, they struggle with length generalization, i.e., handling inputs of unseen lengths. In this work, we demonstrate that looped Transformers with an adaptive number of steps significantly improve length generalization. We focus on tasks with a known iterative solution, involving multiple iterations of a RASP-L operation - a length-generalizable operation that can be expressed by a finite-sized Transformer. We train looped Transformers using our proposed learning algorithm and observe that they learn highly length-generalizable solutions for various tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.15647

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?

Fan, Ying, Li, Jingling, Swaminathan, Adith, Modi, Aditya, Cheng, Ching-An

arXiv.org Artificial IntelligenceAug-14-2024

We present a novel method, Contextual goal-Oriented Data Augmentation (CODA), which uses commonly available unlabeled trajectories and context-goal pairs to solve Contextual Goal-Oriented (CGO) problems. By carefully constructing an action-augmented MDP that is equivalent to the original MDP, CODA creates a fully labeled transition dataset under training contexts without additional approximation error. We conduct a novel theoretical analysis to demonstrate CODA's capability to solve CGO problems in the offline data setup. Empirical results also showcase the effectiveness of CODA, which outperforms other baseline methods across various context-goal relationships of CGO problem. This approach offers a promising direction to solving CGO problems using offline datasets.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2408.07753

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Domain Generalization via Nuclear Norm Regularization

Shi, Zhenmei, Ming, Yifei, Fan, Ying, Sala, Frederic, Liang, Yingyu

arXiv.org Artificial IntelligenceDec-4-2023

The ability to generalize to unseen domains is crucial for machine learning systems deployed in the real world, especially when we only have data from limited training domains. In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization. Intuitively, the proposed regularizer mitigates the impacts of environmental features and encourages learning domain-invariant features. Theoretically, we provide insights into why nuclear norm regularization is more effective compared to ERM and alternative regularization methods. Empirically, we conduct extensive experiments on both synthetic and real datasets. We show nuclear norm regularization achieves strong performance compared to baselines in a wide range of domain generalization tasks. Moreover, our regularizer is broadly applicable with various methods such as ERM and SWAD with consistently improved performance, e.g., 1.7% and 0.9% test accuracy improvements respectively on the DomainBed benchmark.

artificial intelligence, generalization, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.07527

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Fan, Ying, Watkins, Olivia, Du, Yuqing, Liu, Hao, Ryu, Moonkyung, Boutilier, Craig, Abbeel, Pieter, Ghavamzadeh, Mohammad, Lee, Kangwook, Lee, Kimin

arXiv.org Artificial IntelligenceNov-1-2023

Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality. Our code is available at https://github.com/google-research/google-research/tree/master/dpok.

artificial intelligence, fine-tuning text-to-image diffusion model, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2305.16381

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Optimizing DDPM Sampling with Shortcut Fine-Tuning

Fan, Ying, Lee, Kangwook

arXiv.org Artificial IntelligenceMay-24-2023

In this study, we propose Shortcut Fine-Tuning (SFT), a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs). SFT advocates for the fine-tuning of DDPM samplers through the direct minimization of Integral Probability Metrics (IPM), instead of learning the backward diffusion process. This enables samplers to discover an alternative and more efficient sampling shortcut, deviating from the backward diffusion process. Inspired by a control perspective, we propose a new algorithm SFT-PG: Shortcut Fine-Tuning with Policy Gradient, and prove that under certain assumptions, gradient descent of diffusion models with respect to IPM is equivalent to performing policy gradient. To our best knowledge, this is the first attempt to utilize reinforcement learning (RL) methods to train diffusion models. Through empirical evaluation, we demonstrate that our fine-tuning method can further enhance existing fast DDPM samplers, resulting in sample quality comparable to or even surpassing that of the full-step model across various datasets.

artificial intelligence, gradient, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.13362

Country: North America > United States (0.28)

Genre:

Workflow (0.93)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance

Kwon, Dohyun, Fan, Ying, Lee, Kangwook

arXiv.org Artificial IntelligenceDec-12-2022

Score-based generative models are shown to achieve remarkable empirical performances in various applications such as image generation and audio synthesis. However, a theoretical understanding of score-based diffusion models is still incomplete. Recently, Song et al. showed that the training objective of score-based generative models is equivalent to minimizing the Kullback-Leibler divergence of the generated distribution from the data distribution. In this work, we show that score-based models also minimize the Wasserstein distance between them under suitable assumptions on the model. Specifically, we prove that the Wasserstein distance is upper bounded by the square root of the objective function up to multiplicative constants and a fixed constant offset. Our proof is based on a novel application of the theory of optimal transport, which can be of independent interest to the society. Our numerical experiments support our findings. By analyzing our upper bounds, we provide a few techniques to obtain tighter upper bounds.

artificial intelligence, machine learning, wasserstein distance, (17 more...)

arXiv.org Artificial Intelligence

2212.06359

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction

Qi, Pi, Zhu, Xiaoqiang, Zhou, Guorui, Zhang, Yujing, Wang, Zhe, Ren, Lejian, Fan, Ying, Gai, Kun

arXiv.org Machine LearningJun-28-2020

Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches. In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence which is relevant to candidate item; (ii) Exact Search Unit models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1\% CTR and 4.4\% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.

behavior data, deep learning, user behavior, (20 more...)

arXiv.org Machine Learning

2006.05639

Country:

Oceania > Australia (0.14)
North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Services (0.87)
Marketing (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient Model-Free Reinforcement Learning Using Gaussian Process

Fan, Ying, Chen, Letian, Wang, Yizhou

arXiv.org Machine LearningDec-11-2018

Efficient Reinforcement Learning usually takes advantage of demonstration or good exploration strategy. By applying posterior sampling in model-free RL under the hypothesis of GP, we propose Gaussian Process Posterior Sampling Reinforcement Learning(GPPSTD) algorithm in continuous state space, giving theoretical justifications and empirical results. We also provide theoretical and empirical results that various demonstration could lower expected uncertainty and benefit posterior sampling exploration. In this way, we combined the demonstration and exploration process together to achieve a more efficient reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1812.04359

Country: North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Interest Evolution Network for Click-Through Rate Prediction

Zhou, Guorui, Mou, Na, Fan, Ying, Pi, Qi, Bian, Weijie, Zhou, Chang, Zhu, Xiaoqiang, Gai, Kun

arXiv.org Machine LearningSep-10-2018

Click-through rate~(CTR) prediction, whose goal is to estimate the probability of the user clicks, has become one of the core tasks in advertising systems. For CTR prediction model, it is necessary to capture the latent user interest behind the user behavior data. Besides, considering the changing of the external environment and the internal cognition, user interest evolves over time dynamically. There are several CTR prediction methods for interest modeling, while most of them regard the representation of behavior as the interest directly, and lack specially modeling for latent interest behind the concrete behavior. Moreover, few work consider the changing trend of interest. In this paper, we propose a novel model, named Deep Interest Evolution Network~(DIEN), for CTR prediction. Specifically, we design interest extractor layer to capture temporal interests from history behavior sequence. At this layer, we introduce an auxiliary loss to supervise interest extracting at each step. As user interests are diverse, especially in the e-commerce system, we propose interest evolving layer to capture interest evolving process that is relative to the target item. At interest evolving layer, attention mechanism is embedded into the sequential structure novelly, and the effects of relative interests are strengthened during interest evolution. In the experiments on both public and industrial datasets, DIEN significantly outperforms the state-of-the-art solutions. Notably, DIEN has been deployed in the display advertisement system of Taobao, and obtained 20.7\% improvement on CTR.

deep learning, neural network, target item, (18 more...)

arXiv.org Machine Learning

1809.03672

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Marketing (0.67)
Information Technology > Services (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback