AITopics | Chang, Cheng

Collaborating Authors

Chang, Cheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Secrets of RLHF in Large Language Models Part I: PPO

Zheng, Rui, Dou, Shihan, Gao, Songyang, Hua, Yuan, Shen, Wei, Wang, Binghai, Liu, Yan, Jin, Senjie, Liu, Qin, Zhou, Yuhao, Xiong, Limao, Chen, Lu, Xi, Zhiheng, Xu, Nuo, Lai, Wenbin, Zhu, Minghao, Chang, Cheng, Yin, Zhangyue, Weng, Rongxiang, Cheng, Wensen, Huang, Haoran, Sun, Tianxiang, Yan, Hang, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing

arXiv.org Artificial IntelligenceJul-18-2023

Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2307.04964

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A hybrid data driven-physics constrained Gaussian process regression framework with deep kernel for uncertainty quantification

Chang, Cheng, Zeng, Tieyong

arXiv.org Artificial IntelligenceNov-10-2022

In many practical fields like engineering, finance and physics, we need to explore the effects of uncertainties borne by input or parameters, which is the task of the uncertainty quantification (UQ). In a specific UQ problem, if the system is represented as stochastic partial differential equations, then stochastic Galerkin methods like generalised polynomial chaos (gPC) [29] can be used to obtain statistics of the distribution of the unknown solution. As long as a numerical or analytical solver for the system is available, the renowned Monte Carlo (MC) method [8] can be applied to obtain a distribution of the quantities of interest. However, that is usually prohibitively expensive due to the slow convergence of the Monte Carlo method, despite the fact that the running of the computer code for solving some complicated systems one single time might take days. Thus, to some extent we may be willing to compromise accuracy to reduce running time through formulating a cheap surrogate [2].

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.jcp.2023.112129

2205.06494

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback