AITopics | Bai, Yunpeng

Collaborating Authors

Bai, Yunpeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BAFFLE: Hiding Backdoors in Offline Reinforcement Learning Datasets

Gong, Chen, Yang, Zhou, Bai, Yunpeng, He, Junda, Shi, Jieke, Li, Kecen, Sinha, Arunesh, Xu, Bowen, Hou, Xinwen, Lo, David, Wang, Tianhao

arXiv.org Artificial IntelligenceDec-7-2023

Reinforcement learning (RL) makes an agent learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL has become a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle (Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. More specifically, Baffle modifies 10\% of the datasets for four tasks (3 robotic controls and 1 autonomous driving). Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by 63.2\%, 53.9\%, 64.7\%, and 47.4\% in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2210.04688

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.55)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Temporal-spatial Correlation Attention Network for Clinical Data Analysis in Intensive Care Unit

Nie, Weizhi, Yu, Yuhe, Zhang, Chen, Song, Dan, Zhao, Lina, Bai, Yunpeng

arXiv.org Artificial IntelligenceJun-2-2023

In recent years, medical information technology has made it possible for electronic health record (EHR) to store fairly complete clinical data. This has brought health care into the era of "big data". However, medical data are often sparse and strongly correlated, which means that medical problems cannot be solved effectively. With the rapid development of deep learning in recent years, it has provided opportunities for the use of big data in healthcare. In this paper, we propose a temporal-saptial correlation attention network (TSCAN) to handle some clinical characteristic prediction problems, such as predicting death, predicting length of stay, detecting physiologic decline, and classifying phenotypes. Based on the design of the attention mechanism model, our approach can effectively remove irrelevant items in clinical data and irrelevant nodes in time according to different tasks, so as to obtain more accurate prediction results. Our method can also find key clinical indicators of important outcomes that can be used to improve treatment options. Our experiments use information from the Medical Information Mart for Intensive Care (MIMIC-IV) database, which is open to the public. Finally, we have achieved significant performance benefits of 2.0\% (metric) compared to other SOTA prediction methods. We achieved a staggering 90.7\% on mortality rate, 45.1\% on length of stay. The source code can be find: \url{https://github.com/yuyuheintju/TSCAN}.

bioinformatics, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2306.0197

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Xu, Zhiwei, Li, Dapeng, Zhang, Bin, Zhan, Yuan, Bai, Yunpeng, Fan, Guoliang

arXiv.org Artificial IntelligenceDec-6-2022

Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to learn the model of the environment. The significant compounding error may hinder the learning process when model-based methods are applied to multi-agent tasks. This paper proposes an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in different partially observable Markov decision process domains.

machine learning, reinforcement, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2204.09418

Country:

North America > Canada (0.68)
Europe (0.68)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Value Function Factorisation with Hypergraph Convolution for Cooperative Multi-agent Reinforcement Learning

Bai, Yunpeng, Gong, Chen, Zhang, Bin, Fan, Guoliang, Hou, Xinwen

arXiv.org Artificial IntelligenceDec-9-2021

Cooperation between agents in a multi-agent system (MAS) has become a hot topic in recent years, and many algorithms based on centralized training with decentralized execution (CTDE), such as VDN and QMIX, have been proposed. However, these methods disregard the information hidden in the individual action values. In this paper, we propose HyperGraph CoNvolution MIX (HGCN-MIX), a method that combines hypergraph convolution with value decomposition. By treating action values as signals, HGCN-MIX aims to explore the relationship between these signals via a self-learning hypergraph. Experimental results present that HGCN-MIX matches or surpasses state-of-the-art techniques in the StarCraft II multi-agent challenge (SMAC) benchmark on various situations, notably those with a number of agents.

artificial intelligence, computer game, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2112.06771

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with Dual Coordination Mechanism

Xu, Zhiwei, Bai, Yunpeng, Zhang, Bin, Li, Dapeng, Fan, Guoliang

arXiv.org Artificial IntelligenceOct-14-2021

Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents. In this paper, we propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems. In order to address instabilities that arise from the concurrent optimization of high-level and low-level policies and another concurrent optimization of agents, we introduce the dual coordination mechanism of inter-layer strategies and inter-agent strategies. HAVEN does not require domain knowledge and pretraining at all, and can be applied to any value decomposition variants. Our method is demonstrated to achieve superior results to many baselines on StarCraft II micromanagement tasks and offers an efficient solution to multi-agent hierarchical reinforcement learning in fully cooperative scenarios.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2110.07246

Country: North America > United States > California (0.68)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.90)

Add feedback

Combing Policy Evaluation and Policy Improvement in a Unified f-Divergence Framework

Gong, Chen, He, Qiang, Bai, Yunpeng, Chen, Xiaoyu, Hou, Xinwen, Liu, Yu, Fan, Guoliang

arXiv.org Artificial IntelligenceSep-24-2021

The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. In this paper, we start from studying the f-divergence between learning policy and sampling policy and derive a novel DRL framework, termed f-Divergence Reinforcement Learning (FRL). We highlight that the policy evaluation and policy improvement phases are induced by minimizing f-divergence between learning policy and sampling policy, which is distinct from the conventional DRL algorithm objective that maximizes the expected cumulative rewards. Besides, we convert this framework to a saddle-point optimization problem with a specific f function through Fenchel conjugate, which consists of policy evaluation and policy improvement. Then we derive new policy evaluation and policy improvement methods in FRL. Our framework may give new insights for analyzing DRL algorithms. The FRL framework achieves two advantages: (1) policy evaluation and policy improvement processes are derived simultaneously by f-divergence; (2) overestimation issue of value function are alleviated. To evaluate the effectiveness of the FRL framework, we conduct experiments on Atari 2600 video games, which show that our framework matches or surpasses the DRL algorithms we tested.

computer game, optimization problem, policy evaluation, (15 more...)

arXiv.org Artificial Intelligence

2109.11867

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

SIDE: I Infer the State I Want to Learn

Xu, Zhiwei, Bai, Yunpeng, Li, Dapeng, Zhang, Bin, Fan, Guoliang

arXiv.org Artificial IntelligenceMay-13-2021

On the As one of the solutions to the Dec-POMDP problem, the value other hand, in order to extract helpful information from the state of decomposition method has achieved good results recently. However, the complex environment, some work[12, 19] promotes the neural most value decomposition methods require the global state network to learn useful state information by adding auxiliary tasks during training, but this is not feasible in some scenarios where mainly to predict the state of the next moment. Intuitively, the the global state cannot be obtained. Therefore, we propose a novel problem with these studies is in that they cannot be implemented value decomposition framework, named State Inference for value for tasks that cannot obtain real state information. DEcomposition (SIDE), which eliminates the need to know the true As a notorious problem in MAS, Dec-POMDP[25] describes some state by simultaneously seeking solutions to the two problems of collaboration problems.

computer game, information, neural network, (15 more...)

arXiv.org Artificial Intelligence

2105.06228

Country:

Europe (0.69)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning to Coordinate via Multiple Graph Neural Networks

Xu, Zhiwei, Zhang, Bin, Bai, Yunpeng, Li, Dapeng, Fan, Guoliang

arXiv.org Artificial IntelligenceApr-8-2021

The collaboration between agents has gradually become an important topic in multi-agent systems. The key is how to efficiently solve the credit assignment problems. This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks and value-decomposition methods. MGAN learns the representation of agents from different perspectives through multiple graph networks, and realizes the proper allocation of attention between all agents. We show the amazing ability of the graph network in representation learning by visualizing the output of the graph network, and therefore improve interpretability for the actions of each agent in the multi-agent system.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2104.03503

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback