Bai, Yunpeng
BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets
Gong, Chen, Yang, Zhou, Bai, Yunpeng, He, Junda, Shi, Jieke, Li, Kecen, Sinha, Arunesh, Xu, Bowen, Hou, Xinwen, Lo, David, Wang, Tianhao
Reinforcement learning (RL) makes an agent learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL has become a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle (Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. More specifically, Baffle modifies 10\% of the datasets for four tasks (3 robotic controls and 1 autonomous driving). Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by 63.2\%, 53.9\%, 64.7\%, and 47.4\% in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.
Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit
Nie, Weizhi, Yu, Yuhe, Zhang, Chen, Song, Dan, Zhao, Lina, Bai, Yunpeng
In recent years, medical information technology has made it possible for electronic health record (EHR) to store fairly complete clinical data. This has brought health care into the era of "big data". However, medical data are often sparse and strongly correlated, which means that medical problems cannot be solved effectively. With the rapid development of deep learning in recent years, it has provided opportunities for the use of big data in healthcare. In this paper, we propose a temporal-saptial correlation attention network (TSCAN) to handle some clinical characteristic prediction problems, such as predicting death, predicting length of stay, detecting physiologic decline, and classifying phenotypes. Based on the design of the attention mechanism model, our approach can effectively remove irrelevant items in clinical data and irrelevant nodes in time according to different tasks, so as to obtain more accurate prediction results. Our method can also find key clinical indicators of important outcomes that can be used to improve treatment options. Our experiments use information from the Medical Information Mart for Intensive Care (MIMIC-IV) database, which is open to the public. Finally, we have achieved significant performance benefits of 2.0\% (metric) compared to other SOTA prediction methods. We achieved a staggering 90.7\% on mortality rate, 45.1\% on length of stay. The source code can be find: \url{https://github.com/yuyuheintju/TSCAN}.
Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning
Xu, Zhiwei, Li, Dapeng, Zhang, Bin, Zhan, Yuan, Bai, Yunpeng, Fan, Guoliang
Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to learn the model of the environment. The significant compounding error may hinder the learning process when model-based methods are applied to multi-agent tasks. This paper proposes an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in different partially observable Markov decision process domains.
Value Function Factorisation with Hypergraph Convolution for Cooperative Multi-agent Reinforcement Learning
Bai, Yunpeng, Gong, Chen, Zhang, Bin, Fan, Guoliang, Hou, Xinwen
Cooperation between agents in a multi-agent system (MAS) has become a hot topic in recent years, and many algorithms based on centralized training with decentralized execution (CTDE), such as VDN and QMIX, have been proposed. However, these methods disregard the information hidden in the individual action values. In this paper, we propose HyperGraph CoNvolution MIX (HGCN-MIX), a method that combines hypergraph convolution with value decomposition. By treating action values as signals, HGCN-MIX aims to explore the relationship between these signals via a self-learning hypergraph. Experimental results present that HGCN-MIX matches or surpasses state-of-the-art techniques in the StarCraft II multi-agent challenge (SMAC) benchmark on various situations, notably those with a number of agents.
HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism
Xu, Zhiwei, Bai, Yunpeng, Zhang, Bin, Li, Dapeng, Fan, Guoliang
Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents. In this paper, we propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems. In order to address instabilities that arise from the concurrent optimization of high-level and low-level policies and another concurrent optimization of agents, we introduce the dual coordination mechanism of inter-layer strategies and inter-agent strategies. HAVEN does not require domain knowledge and pretraining at all, and can be applied to any value decomposition variants. Our method is demonstrated to achieve superior results to many baselines on StarCraft II micromanagement tasks and offers an efficient solution to multi-agent hierarchical reinforcement learning in fully cooperative scenarios.
Combing Policy Evaluation and Policy Improvement in a Unified f-Divergence Framework
Gong, Chen, He, Qiang, Bai, Yunpeng, Chen, Xiaoyu, Hou, Xinwen, Liu, Yu, Fan, Guoliang
The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. In this paper, we start from studying the f-divergence between learning policy and sampling policy and derive a novel DRL framework, termed f-Divergence Reinforcement Learning (FRL). We highlight that the policy evaluation and policy improvement phases are induced by minimizing f-divergence between learning policy and sampling policy, which is distinct from the conventional DRL algorithm objective that maximizes the expected cumulative rewards. Besides, we convert this framework to a saddle-point optimization problem with a specific f function through Fenchel conjugate, which consists of policy evaluation and policy improvement. Then we derive new policy evaluation and policy improvement methods in FRL. Our framework may give new insights for analyzing DRL algorithms. The FRL framework achieves two advantages: (1) policy evaluation and policy improvement processes are derived simultaneously by f-divergence; (2) overestimation issue of value function are alleviated. To evaluate the effectiveness of the FRL framework, we conduct experiments on Atari 2600 video games, which show that our framework matches or surpasses the DRL algorithms we tested.
SIDE: I Infer the State I Want to Learn
Xu, Zhiwei, Bai, Yunpeng, Li, Dapeng, Zhang, Bin, Fan, Guoliang
On the As one of the solutions to the Dec-POMDP problem, the value other hand, in order to extract helpful information from the state of decomposition method has achieved good results recently. However, the complex environment, some work[12, 19] promotes the neural most value decomposition methods require the global state network to learn useful state information by adding auxiliary tasks during training, but this is not feasible in some scenarios where mainly to predict the state of the next moment. Intuitively, the the global state cannot be obtained. Therefore, we propose a novel problem with these studies is in that they cannot be implemented value decomposition framework, named State Inference for value for tasks that cannot obtain real state information. DEcomposition (SIDE), which eliminates the need to know the true As a notorious problem in MAS, Dec-POMDP[25] describes some state by simultaneously seeking solutions to the two problems of collaboration problems.
Learning to Coordinate via Multiple Graph Neural Networks
Xu, Zhiwei, Zhang, Bin, Bai, Yunpeng, Li, Dapeng, Fan, Guoliang
The collaboration between agents has gradually become an important topic in multi-agent systems. The key is how to efficiently solve the credit assignment problems. This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks and value-decomposition methods. MGAN learns the representation of agents from different perspectives through multiple graph networks, and realizes the proper allocation of attention between all agents. We show the amazing ability of the graph network in representation learning by visualizing the output of the graph network, and therefore improve interpretability for the actions of each agent in the multi-agent system.