AITopics | macro-action

Meta-learning how to Share Credit among Macro-Actions

Neural Information Processing SystemsJun-19-2026, 03:52:02 GMT

One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many scenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environmentspecific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Meta-learning how to Share Credit among Macro-Actions

Hosu, Ionel-Alexandru, Rebedea, Traian, Pascanu, Razvan

arXiv.org Artificial IntelligenceJun-17-2025

One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many scenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environment-specific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy. We empirically validate our strategy looking at macro-actions in Atari games, and the StreetFighter II environment. Our results show significant improvements over the Rainbow-DQN baseline in all environments. Additionally, we show that the macro-action similarity is transferable to related environments. We believe this work is a small but important step towards understanding how the similarity-imposed geometry on the action space can be exploited to improve credit assignment and exploration, therefore making learning more effective.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2506.1369

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Monte Carlo Value Iteration with Macro-Actions

Neural Information Processing SystemsApr-6-2023, 13:11:50 GMT

POMDP planning faces two major computational challenges: large state spaces and long planning horizons. The recently introduced Monte Carlo Value Iteration (MCVI) can tackle POMDPs with very large discrete state spaces or continuous state spaces, but its performance degrades when faced with long planning horizons. This paper presents Macro-MCVI, which extends MCVI by exploiting macro-actions for temporal abstraction. We provide sufficient conditions for Macro-MCVI to inherit the good theoretical properties of MCVI. Macro-MCVI does not require explicit construction of probabilistic models for macro-actions and is thus easy to apply in practice.

macro-action, monte carlo value iteration, state space, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Monte Carlo Value Iteration with Macro-Actions

Lim, Zhan, Sun, Lee, Hsu, David

Neural Information Processing SystemsFeb-14-2020, 22:43:32 GMT

POMDP planning faces two major computational challenges: large state spaces and long planning horizons. The recently introduced Monte Carlo Value Iteration (MCVI) can tackle POMDPs with very large discrete state spaces or continuous state spaces, but its performance degrades when faced with long planning horizons. This paper presents Macro-MCVI, which extends MCVI by exploiting macro-actions for temporal abstraction. We provide sufficient conditions for Macro-MCVI to inherit the good theoretical properties of MCVI. Macro-MCVI does not require explicit construction of probabilistic models for macro-actions and is thus easy to apply in practice.

macro-action, monte carlo value iteration, state space, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Modeling and Planning with Macro-Actions in Decentralized POMDPs

Amato, Christopher, Konidaris, George, Kaelbling, Leslie P., How, Jonathan P.

Journal of Artificial Intelligence ResearchMar-25-2019

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.

agent, dec-pomdp, robot, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11418

AI Access Foundation

11418

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Mining useful Macro-actions in Planning

Castellanos-Paez, Sandra, Pellier, Damien, Fiorino, Humbert, Pesty, Sylvie

arXiv.org Artificial IntelligenceOct-22-2018

Abstract--Planning has achieved significant progress in recent years. Among the various approaches to scale up plan synthesis, the use of macro-actions has been widely explored. As a first stage towards the development of a solution to learn online macro-actions, we propose an algorithm to identify useful macroactions based on data mining techniques. The integration in the planning search of these learned macro-actions shows significant improvements over six classical planning benchmarks. Automated planning is an area of Artificial Intelligence that comes up with the challenge of devising systems that can autonomously find a plan to reach a set of goals. In classical planning, a problem is composed of an initial state, a goal specification and a set of actions. From the initial state if the preconditions of an action are satisfied, the action is applicable to the current state.

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Artificial Intelligence

1810.09145

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Marvin: A Heuristic Search Planner with Online Macro-Action Learning

Coles, A. I., Smith, A. J.

arXiv.org Artificial IntelligenceOct-12-2011

This paper describes Marvin, a planner that competed in the Fourth International Planning Competition (IPC 4). Marvin uses action-sequence-memoisation techniques to generate macro-actions, which are then used during search for a solution plan. We provide an overview of its architecture and search behaviour, detailing the algorithms used. We also empirically demonstrate the effectiveness of its features in various planning domains; in particular, the effects on performance due to the use of macro-actions, the novel features of its search behaviour, and the native support of ADL and Derived Predicates.

artificial intelligence, precondition, survey article, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.2077

1110.2736

Genre:

Overview (0.86)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

An Automatically Configurable Portfolio-based Planner with Macro-actions: PbP

Gerevini, Alfonso (University of Brescia) | Saetti, Alessandro (University of Brescia) | Vallati, Mauro (University of Brescia)

AAAI ConferencesSep-19-2009

The field of automated plan generation has recently significantly advanced. However, while several powerful domainindependent PbP has two variants: PbP.s focusing on speed, and planners have been developed, no one of these PbP.q focusing on plan quality. PbP.s entered the learning clearly outperforms all the others in every known benchmark track of the sixth international planning competition (IPC6), domain. It would then be useful to have a multi-planner system and was the overall winner of this competition track (Fern, that automatically selects and combines the most efficient Khardon and Tadepalli 2008). The paper includes some experimental planner(s) for each given domain.

knowledge, pbp, plan quality, (13 more...)

AAAI Conferences

Nineteenth International Conference on Automated Planning and Scheduling

Country: Europe > Italy (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Marvin: A Heuristic Search Planner with Online Macro-Action Learning

Coles, A. I., Smith, A. J.

Journal of Artificial Intelligence ResearchFeb-21-2007

This paper describes Marvin, a planner that competed in the Fourth International Planning Competition (IPC 4). Marvin uses action-sequence-memoisation techniques to generate macro-actions, which are then used during search for a solution plan. We provide an overview of its architecture and search behaviour, detailing the algorithms used. We also empirically demonstrate the effectiveness of its features in various planning domains; in particular, the effects on performance due to the use of macro-actions, the novel features of its search behaviour, and the native support of ADL and Derived Predicates.

heuristic value, macro-action, precondition, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2077

AI Access Foundation

10485

Journal of Artificial Intelligence Research

Country: