AITopics | Yao, Chengyuan

Collaborating Authors

Yao, Chengyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reward Shaping to Mitigate Reward Hacking in RLHF

Fu, Jiayi, Zhao, Xuandong, Yao, Chengyuan, Wang, Heng, Han, Qi, Xiao, Yanghua

arXiv.org Artificial IntelligenceFeb-26-2025

Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human values. However, RLHF is susceptible to reward hacking, where the agent exploits flaws in the reward function rather than learning the intended behavior, thus degrading alignment. While reward shaping helps stabilize RLHF and partially mitigate reward hacking, a systematic investigation into shaping techniques and their underlying principles remains lacking. To bridge this gap, we present a comprehensive study of the prevalent reward shaping methods. Our analysis suggests three key design principles: (1) RL reward is ideally bounded, (2) RL benefits from rapid initial growth followed by gradual convergence, and (3) RL reward is best formulated as a function of centered reward. Guided by these insights, we propose Preference As Reward (PAR), a novel approach that leverages the latent preferences embedded within the reward model itself as the signal for reinforcement learning. We evaluated PAR on two base models, Gemma2-2B and Llama3-8B, using two datasets, Ultrafeedback-Binarized and HH-RLHF. Experimental results demonstrate PAR's superior performance over other reward shaping methods. On the AlpacaEval 2.0 benchmark, PAR achieves a win rate at least 5 percentage points higher than competing approaches. Furthermore, PAR exhibits remarkable data efficiency, requiring only a single reference reward for optimal performance, and maintains robustness against reward hacking even after two full epochs of training. Code is available at https://github.com/PorUna-byte/PAR.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2502.1877

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Discovery of Adaptive Attacks on Adversarial Defenses

Yao, Chengyuan, Bielik, Pavol, Tsankov, Petar, Vechev, Martin

arXiv.org Machine LearningFeb-27-2021

To address this challenge, two recent works approach the problem from different perspectives. Tramer et al. (2020) Reliable evaluation of adversarial defenses is a outlines an approach for manually crafting adaptive attacks challenging task, currently limited to an expert that exploit the weak points of each defense. Here, a domain who manually crafts attacks that exploit the defense's expert starts with an existing attack, such as PGD (Madry inner workings, or to approaches based et al., 2018) (denoted as - in Figure 1), and adapts it based on on ensemble of fixed attacks, none of which may knowledge of the defense's inner workings. Common modifications be effective for the specific defense at hand. Our include: (i) tuning attack parameters (e.g., number key observation is that custom attacks are composed of steps), (ii) replacing network components to simplify the from a set of reusable building blocks, attack (e.g., removing randomization or non-differentiable such as fine-tuning relevant attack parameters, network components), and (iii) replacing the loss function optimized transformations, and custom loss functions.

deep learning, neural network, search space, (15 more...)

arXiv.org Machine Learning

2102.1186

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Learning for Post-Processing Ensemble Weather Forecasts

Grönquist, Peter, Yao, Chengyuan, Ben-Nun, Tal, Dryden, Nikoli, Dueben, Peter, Li, Shigang, Hoefler, Torsten

arXiv.org Machine LearningSep-21-2020

Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many perturbed numerical weather simulations, or trajectories, run in parallel. These systems are associated with a high computational cost and often involve statistical post-processing steps to inexpensively improve their raw prediction qualities. We propose a mixed model that uses only a subset of the original weather trajectories combined with a post-processing step using deep neural networks. These enable the model to account for non-linear relationships that are not captured by current numerical models or post-processing methods. Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 14%. Furthermore, we demonstrate that the improvement is larger for extreme weather events on select case studies. We also show that our post-processing can use fewer trajectories to achieve comparable results to the full ensemble. By using fewer trajectories, the computational costs of an ensemble prediction system can be reduced, allowing it to run at higher resolution and produce more accurate forecasts.

deep learning, neural network, prediction, (18 more...)

arXiv.org Machine Learning

2005.08748

Country:

Europe (0.14)
Asia (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback