Cui, Jing
BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning
Cui, Jing, Han, Yufei, Ma, Yuzhe, Jiao, Jianbin, Zhang, Junge
Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing.
Dynamic Controllability of Controllable Conditional Temporal Problems with Uncertainty
Cui, Jing, Haslum, Patrik
Dynamic Controllability (DC) of a Simple Temporal Problem with Uncertainty (STPU) uses a dynamic decision strategy, rather than a fixed schedule, to tackle temporal uncertainty. We extend this concept to the Controllable Conditional Temporal Problem with Uncertainty (CCTPU), which extends the STPU by conditioning temporal constraints on the assignment of controllable discrete variables. We define dynamic controllability of a CCTPU as the existence of a strategy that decides on both the values of discrete choice variables and the scheduling of controllable time points dynamically. This contrasts with previous work, which made a static assignment of choice variables and dynamic decisions over time points only. We propose an algorithm to find such a fully dynamic strategy. The algorithm computes the "envelope" of outcomes of temporal uncertainty in which a particular assignment of discrete variables is feasible, and aggregates these over all choices. When an aggregated envelope covers all uncertain situations of the CCTPU, the problem is dynamically controllable. However, the algorithm is complete only under certain assumptions. Experiments on an existing set of CCTPU benchmarks show that there are cases in which making both discrete and temporal decisions dynamically it is feasible to satisfy the problem constraints while assigning the discrete variables statically it is not.
Resolving Over-Constrained Temporal Problems with Uncertainty through Conflict-Directed Relaxation
Yu, Peng, Williams, Brian, Fang, Cheng, Cui, Jing, Haslum, Patrik
Over-subscription, that is, being assigned too many things to do, is commonly encountered in temporal scheduling problems. As human beings, we often want to do more than we can actually do, and underestimate how long it takes to perform each task. Decision makers can benefit from aids that identify when these failure situations are likely, the root causes of these failures, and resolutions to these failures. In this paper, we present a decision assistant that helps users resolve over-subscribed temporal problems. The system works like an experienced advisor that can quickly identify the cause of failure underlying temporal problems and compute resolutions. The core of the decision assistant is the Best-first Conflict-Directed Relaxation (BCDR) algorithm, which can detect conflicting sets of constraints within temporal problems, and computes continuous relaxations for them that weaken constraints to the minimum extent, instead of removing them completely. BCDR is an extension to the Conflict-Directed A* algorithm, first developed in the model-based reasoning community to compute most likely system diagnoses or reconfigurations. It generalizes the discrete conflicts and relaxations, to hybrid conflicts and relaxations, which denote minimal inconsistencies and minimal relaxations to both discrete and continuous relaxable constraints. In addition, BCDR is capable of handling temporal uncertainty, expressed as either set-bounded or probabilistic durations, and can compute preferred trade-offs between the risk of violating a schedule requirement, versus the loss of utility by weakening those requirements. BCDR has been applied to several decision support applications in different domains, including deep-sea exploration, urban travel planning and transit system management. It has demonstrated its effectiveness in helping users resolve over-subscribed scheduling problems and evaluate the robustness of existing solutions. In our benchmark experiments, BCDR has also demonstrated its efficiency on solving large-scale scheduling problems in the aforementioned domains. Thanks to its conflict-driven approach for computing relaxations, BCDR achieves one to two orders of magnitude improvements on runtime performance when compared to state-of-the-art numerical solvers.