AITopics | Peng, Andy

Collaborating Authors

Peng, Andy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhou, Zhiyuan, Peng, Andy, Li, Qiyang, Levine, Sergey, Kumar, Aviral

arXiv.org Artificial IntelligenceDec-11-2024

The predominant paradigm for learning at scale today involves pre-training models on diverse prior data, and then fine-tuning them on narrower domain-specific data to specialize them to particular downstream tasks [7, 4, 9, 37, 55, 50, 59]. In the context of learning decision-making policies, this paradigm translates to pre-training on a large amount of previously collected static experience via offline reinforcement learning (RL), followed by fine-tuning these initializations via online RL efficiently. Generally, this fine-tuning is done by continuing training with the very same offline RL algorithm, e.g., pessimistic [28, 6] algorithms or algorithms that apply behavioral constraints [14, 27], on a mixture of offline data and autonomous online data, with minor modifications to the offline RL algorithm itself [33]. While this paradigm has led to promising results [27, 33], RL fine-tuning requires continued training on offline data for stability and performance ([56, 57]; Section 3), as opposed to the standard practice in machine learning. Retaining offline data is problematic for several reasons. First, as offline datasets grow in size and diversity, continued online training on offline data becomes inefficient and expensive, and such computation requirements may even deter practitioners from using online RL for fine-tuning. Second, the need for retaining offline data perhaps defeats the point of offline RL pre-training altogether: recent results [47], corroborated by our experiments in Section 3, indicate that current fine-tuning approaches are not able to make good use of several strong offline RL value and/or policy initializations, as shown by the superior performance of running online RL from scratch with offline data put in the replay buffer [3]. These problems put the efficacy of current RL fine-tuning approaches into question. In this paper, we aim to understand and address the aforementioned shortcomings of current online finetuning methods and build an online RL approach that does not retain offline data.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2412.07762

Country: North America > United States (0.28)

Genre:

Research Report (0.64)
Instructional Material > Online (0.43)

Industry: Education > Educational Setting > Online (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback