Constrained episodic reinforcement learning in concave-convex and knapsack settings
Brantley, Kianté, Dudik, Miroslav, Lykouris, Thodoris, Miryoosefi, Sobhan, Simchowitz, Max, Slivkins, Aleksandrs, Sun, Wen
–arXiv.org Artificial Intelligence
Standard reinforcement learning (RL) approaches seek to maximize a scalar reward (Sutton and Barto, 1998, 2018; Schulman et al., 2015; Mnih et al., 2015), but in many settings this is insufficient, because the desired properties of the agent behavior are better described using constraints. For example, an autonomous vehicle should not only get to the destination, but should also respect safety, fuel efficiency, and human comfort constraints along the way (Le et al., 2019); a robot should not only fulfill its task, but should also control its wear and tear, for example, by limiting the torque exerted on its motors (Tessler et al., 2019). Moreover, in many settings, we wish to satisfy such constraints already during training and not only during the deployment. For example, a power grid, an autonomous vehicle, or a real robotic hardware should avoid costly failures, where the hardware is damaged or humans are harmed, already during training (Leike et al., 2017; Ray et al., 2020). Constraints are also key in additional sequential decision making applications, such as dynamic pricing with limited supply, e.g., (Besbes and Zeevi, 2009; Babaioff et al., 2015), scheduling of resources on a computer cluster (Mao et al., 2016), and imitation learning, where the goal is to stay close to an expert behavior (Syed and Schapire, 2007; Ziebart et al., 2008; Sun et al., 2019).
arXiv.org Artificial Intelligence
Jun-9-2020
- Country:
- North America > United States (0.93)
- Genre:
- Research Report (0.82)
- Industry:
- Education > Educational Setting (0.46)
- Energy (0.34)
- Technology: