learning macro-action
Strategic Attentive Writer for Learning Macro-Actions
We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner purely by interacting with an environment in reinforcement learning setting. The network builds an internal plan, which is continuously updated upon observation of the next input from the environment. It can also partition this internal representation into contiguous sub-sequences by learning for how long the plan can be committed to -- i.e. followed without replaning. Combining these properties, the proposed model, dubbed STRategic Attentive Writer (STRAW) can learn high-level, temporally abstracted macro-actions of varying lengths that are solely learnt from data without any prior information. These macro-actions enable both structured exploration and economic computation. We experimentally demonstrate that STRAW delivers strong improvements on several ATARI games by employing temporally extended planning strategies (e.g.
Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time
Veronese, Celeste, Meli, Daniele, Farinelli, Alessandro
Most popular and effective approaches to online solving Partially Observable Markov Decision Processes (POMDPs, Kaelbling et al. (1998)), e.g., Partially Observable Monte Carlo Planning (POMCP) by Silver and Veness (2010) and Determinized Sparse Partially Observable Tree (DESPOT) by Ye et al. (2017), rely on Monte Carlo Tree Search (MCTS). These approaches are based on online simulations performed in a simulation environment (i.e. a black-box twin of the real POMDP environment) and estimate the value of actions. However, they require domain-specific policy heuristics, suggesting best actions at each state, for efficient exploration. Macro-actions (He et al. (2011); Bertolucci et al. (2021)) are popular policy heuristics that are particularly efficient for long planning horizons. A macro-action is essentially a sequence of suggested actions from a given state that can effectively guide the simulation phase towards actions with high utilities. However, such heuristics are heavily dependent on domain features and are typically handcrafted for each specific domain. Defining these heuristics is an arduous process that requires significant domain knowledge, especially in complex domains. An alternative approach, like the one by Cai and Hsu (2022), is to learn such heuristics via neural networks, which are, however, uninterpretable and data-inefficient. This paper extends the methodology proposed by Meli et al. (2024) to the learning, via Inductive Logic Programming (ILP, Muggleton (1991)), of Event Calculus (EC) theories C. Veronese, D. Meli & A. Farinelli.
- Transportation (0.34)
- Leisure & Entertainment > Games (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Reviews: Strategic Attentive Writer for Learning Macro-Actions
Summary of Recommendation: The paper introduces an original idea. Committing to a plan has been introduced before in RL, e.g., in Sutton's options literature (where no learning occurs), and Schmidhuber's hierarchical RL systems of the early 1990s, and Wiering's HQ learning, but the new approach is different. However, the formalisation and experimental section seem to lack clarity and raise several questions. In particular, the experiments don't show very convincingly that the attentional mechanism is needed (although it seems like a very nice idea) and the actual behaviour of the attention is not explored at all. I don't see this as a fatal flaw, but this is definitely problematic since the title and main thrust of the paper rely on it.
Learning Macro-Actions in Reinforcement Learning
We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the race-track task and some grid-world tasks. For the bicycle and race-track tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion.
Strategic Attentive Writer for Learning Macro-Actions
Vezhnevets, Alexander, Mnih, Volodymyr, Osindero, Simon, Graves, Alex, Vinyals, Oriol, Agapiou, John, kavukcuoglu, koray
We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner purely by interacting with an environment in reinforcement learning setting. The network builds an internal plan, which is continuously updated upon observation of the next input from the environment. It can also partition this internal representation into contiguous sub-sequences by learning for how long the plan can be committed to -- i.e. followed without replaning. Combining these properties, the proposed model, dubbed STRategic Attentive Writer (STRAW) can learn high-level, temporally abstracted macro-actions of varying lengths that are solely learnt from data without any prior information. These macro-actions enable both structured exploration and economic computation.
Learning Macro-Actions in Reinforcement Learning
We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the racetrack task and some grid-world tasks. For the bicycle and racetrack tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion.
- North America > United States > Massachusetts (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)
Learning Macro-Actions in Reinforcement Learning
We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the racetrack task and some grid-world tasks. For the bicycle and racetrack tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion.
- North America > United States > Massachusetts (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)
Learning Macro-Actions in Reinforcement Learning
We present a method for automatically constructing macro-actions from scratch from primitive actions during the reinforcement learning process. The overall idea is to reinforce the tendency to perform action b after action a if such a pattern of actions has been rewarded. We test the method on a bicycle task, the car-on-the-hill task, the racetrack task and some grid-world tasks. For the bicycle and racetrack tasks the use of macro-actions approximately halves the learning time, while for one of the grid-world tasks the learning time is reduced by a factor of 5. The method did not work for the car-on-the-hill task for reasons we discuss in the conclusion. 1 INTRODUCTION A macro-action is a sequence of actions chosen from the primitive actions of the problem.
- North America > United States > Massachusetts (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)