AITopics | Zhang, Fuxiang

Collaborating Authors

Zhang, Fuxiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

Zhang, Fuxiang, Li, Junyou, Li, Yi-Chen, Zhang, Zongzhang, Yu, Yang, Ye, Deheng

arXiv.org Artificial IntelligenceJul-4-2024

Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this paper, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation. We ground LLMs by feeding a few pre-collected experiences and requesting them to delineate background knowledge of the environment. Afterward, we represent the output knowledge as potential functions for potential-based reward shaping, which has a good property for maintaining policy optimality from task rewards. We instantiate three variants to prompt LLMs for background knowledge, including writing code, annotating preferences, and assigning goals. Our experiments show that these methods achieve significant sample efficiency improvements in a spectrum of downstream tasks from Minigrid and Crafter domains.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2407.03964

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Q-Adapter: Training Your LLM Adapter as a Residual Q-Function

Li, Yi-Chen, Zhang, Fuxiang, Qiu, Wenjie, Yuan, Lei, Jia, Chengxing, Zhang, Zongzhang, Yu, Yang

arXiv.org Artificial IntelligenceJul-4-2024

We consider the problem of adapting Large Language Models (LLMs) pre-trained with Reinforcement Learning from Human Feedback (RLHF) to downstream preference data. Naive approaches to achieve this could be supervised fine-tuning on preferred responses or reinforcement learning with a learned reward model. However, the LLM runs the risk of forgetting its initial knowledge as the fine-tuning progresses. To customize the LLM while preserving its existing capabilities, this paper proposes a novel method, named as Q-Adapter. We start by formalizing LLM adaptation as a problem of maximizing the linear combination of two rewards, one of which corresponds to the reward optimized by the pre-trained LLM and the other to the downstream preference data. Although both rewards are unknown, we show that this can be solved by directly learning a new module from the preference data that approximates the \emph{residual Q-function}. We consider this module to be an adapter because the original pre-trained LLM, together with it, can form the optimal customised LLM. Empirically, experiments on a range of domain-specific tasks and safety alignment tasks illustrate the superiority of Q-Adapter in both anti-forgetting and learning from new preferences.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.03856

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

Ran, Yuhang, Li, Yi-Chen, Zhang, Fuxiang, Zhang, Zongzhang, Yu, Yang

arXiv.org Artificial IntelligenceAug-15-2023

We consider the problem of learning the best possible policy from a fixed dataset, known as offline Reinforcement Learning (RL). A common taxonomy of existing offline RL works is policy regularization, which typically constrains the learned policy by distribution or support of the behavior policy. However, distribution and support constraints are overly conservative since they both force the policy to choose similar actions as the behavior policy when considering particular states. It will limit the learned policy's performance, especially when the behavior policy is sub-optimal. In this paper, we find that regularizing the policy towards the nearest state-action pair can be more effective and thus propose Policy Regularization with Dataset Constraint (PRDC). When updating the policy in a given state, PRDC searches the entire dataset for the nearest state-action sample and then restricts the policy with the action of this sample. Unlike previous works, PRDC can guide the policy with proper behaviors from the dataset, allowing it to choose actions that do not appear in the dataset along with the given state. It is a softer constraint but still keeps enough conservatism from out-of-distribution actions. Empirical evidence and theoretical analysis show that PRDC can alleviate offline RL's fundamentally challenging value overestimation issue with a bounded performance gap. Moreover, on a set of locomotion and navigation tasks, PRDC achieves state-of-the-art performance compared with existing methods. Code is available at https://github.com/LAMDA-RL/PRDC

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2306.06569

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Multi-agent Continual Coordination via Progressive Task Contextualization

Yuan, Lei, Li, Lihe, Zhang, Ziqian, Zhang, Fuxiang, Guan, Cong, Yu, Yang

arXiv.org Artificial IntelligenceMay-7-2023

Cooperative Multi-agent Reinforcement Learning (MARL) has attracted prominent attention in recent years [1], and achieved great progress in multiple aspects, like path finding [2], active voltage control [3], and dynamic algorithm configuration [4]. Among the multitudinous methods, researchers, on the one hand, focus on facilitating coordination ability via solving specific challenges, including non-stationarity [5], credit assignment [6], and scalability [7]. Other works, on the other hand, investigate the cooperative MARL from multiple aspects, like efficient communication [8], zero-shot coordination (ZSC) [9], policy robustness [10], etc. A lot of methods emerge as promising solutions for different scenarios, including policy-based ones [11,12], value-based series [13,14], and many other variants, showing remarkable coordination ability in a wide range of tasks like SMAC [15]. Despite the great success, the mainstream cooperative MARL methods are still restricted to being trained in one single task or multiple tasks simultaneously, assuming that the agents have access to data from all tasks at all times, which is unrealistic for physical agents in the real world that can only attend to one task at a time. Continual Reinforcement Learning plays a promising role in the mentioned problem [16], where the agent aims to avoid catastrophic forgetting, as well as enable knowledge transfer to new tasks (a.k.a.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

2305.13937

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback