Hierarchies of Reward Machines
Furelos-Blanco, Daniel, Law, Mark, Jonsson, Anders, Broda, Krysia, Russo, Alessandra
–arXiv.org Artificial Intelligence
Hierarchical reinforcement learning (HRL; Barto & Mahadevan, 2003) frameworks, such as options (Sutton et al., Reward machines (RMs) are a recent formalism 1999), have been used to exploit RMs by learning policies for representing the reward function of a reinforcement at two levels of abstraction: (i) select a formula (i.e., subgoal) learning task through a finite-state machine from a given RM state, and (ii) select an action to whose edges encode subgoals of the task using (eventually) satisfy the chosen formula (Toro Icarte et al., high-level events. The structure of RMs enables 2018; Furelos-Blanco et al., 2021). The subtask decomposition the decomposition of a task into simpler and independently powered by HRL enables learning at multiple scales solvable subtasks that help tackle longhorizon simultaneously, and eases the handling of long-horizon and and/or sparse reward tasks. We propose sparse reward tasks. In addition, several works have considered a formalism for further abstracting the subtask the problem of learning the RMs themselves from structure by endowing an RM with the ability to interaction (e.g., Toro Icarte et al., 2019; Xu et al., 2020; call other RMs, thus composing a hierarchy of Furelos-Blanco et al., 2021; Hasanbeig et al., 2021).
arXiv.org Artificial Intelligence
Jun-4-2023
- Country:
- Europe > United Kingdom
- England (0.14)
- North America > United States
- Hawaii (0.14)
- Europe > United Kingdom
- Genre:
- Research Report (0.63)
- Workflow (0.67)
- Industry:
- Education (0.67)
- Technology: