Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbations
–arXiv.org Artificial Intelligence
Reinforcement learning (RL) has achieved phenomenal success in various domains. However, its data-driven nature also introduces new vulnerabilities that can be exploited by malicious opponents. Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. However, the former does not provide sufficient protection against strong attacks, while the latter is computationally prohibitive for large environments. In this work, we propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states. This approach is further enhanced with belief state inference and diffusion-based state purification to reduce uncertainty. Empirical results show that our approach obtains superb performance under strong attacks and has a comparable training overhead with regularization-based methods. As one of the major paradigms for data-driven control, reinforcement learning (RL) provides a principled and solid framework for sequential decision-making under uncertainty. By incorporating the approximation capacity of deep neural networks, deep reinforcement learning (DRL) has found impressive applications in robotics (Levine et al., 2016), large generative models (OpenAI, 2023), and autonomous driving (Kiran et al., 2021), and obtained super-human performance in tasks such as Go (Silver et al., 2016) and Gran Turismo (Wurman et al., 2022). However, an RL agent is subject to various types of attacks, including state and reward perturbation, action space manipulation, and model inference and poisoning (Ilahi et al., 2022). Recent studies have shown that an RL agent can be manipulated by poisoning its observation (Huang et al., 2017; Zhang et al., 2020a) and reward signals (Huang & Zhu, 2019), and a well-trained RL agent can be easily defeated by a malicious opponent behaving unexpectedly (Gleave et al., 2020). In particular, recent research has demonstrated the brittleness (Zhang et al., 2020a; Sun et al., 2021) of existing RL algorithms in the face of adversarial state perturbations, where a malicious agent strategically and stealthily perturbs the observations of a trained RL agent, causing a significant loss of cumulative reward. Such an attack can be implemented in practice by exploiting the defects in the agent's perception component, e.g., sensors and communication channels. This raises significant concerns when applying RL techniques in security and safety-critical domains.
arXiv.org Artificial Intelligence
Mar-6-2024
- Country:
- Asia (0.04)
- North America > United States
- Louisiana > Orleans Parish > New Orleans (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology > Security & Privacy (0.93)
- Leisure & Entertainment > Games (0.69)
- Technology: