AITopics | macro action

Collaborating Authors

macro action

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Human-Like RLAgents through Trajectory Optimization with Action Quantization

Neural Information Processing SystemsJun-18-2026, 17:18:32 GMT

Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents in the human evaluation study. Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.67)
Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

Guo, Jian-Ting, Chen, Yu-Cheng, Hsieh, Ping-Chun, Ho, Kuo-Hao, Huang, Po-Wei, Wu, Ti-Rong, Wu, I-Chen

arXiv.org Artificial IntelligenceNov-20-2025

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2511.15055

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.93)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Ma, Haitong, Nabati, Ofir, Rosenberg, Aviv, Dai, Bo, Lang, Oran, Szpektor, Idan, Boutilier, Craig, Li, Na, Mannor, Shie, Shani, Lior, Tenneholtz, Guy

arXiv.org Artificial IntelligenceOct-2-2025

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these complex settings. Our key innovation is an efficient online training process that ensures stable and effective policy improvement. By leveraging policy mirror descent (PMD) to define an ideal, regularized target policy distribution, we frame the policy update as a distributional matching problem, training the expressive diffusion model to replicate this stable target. This decoupled approach stabilizes learning and significantly enhances training performance. Our method achieves state-of-the-art results and superior sample efficiency across a diverse set of challenging combinatorial benchmarks, including DNA sequence generation, RL with macro-actions, and multi-agent systems. Experiments demonstrate that our diffusion policies attain superior performance compared to other baselines.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.22963

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games (0.68)
Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Robot Operation of Home Appliances by Reading User Manuals

Zhang, Jian, Zhang, Hanbo, Xiao, Anxing, Hsu, David

arXiv.org Artificial IntelligenceJul-24-2025

Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by "reading" their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.20424

Genre:

Workflow (1.00)
Research Report > New Finding (0.87)

Industry: Appliances & Durable Goods (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)
Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.65)

Add feedback

Deep Reinforcement Learning Based Navigation with Macro Actions and Topological Maps

Hakenes, Simon, Glasmachers, Tobias

arXiv.org Artificial IntelligenceApr-28-2025

This paper addresses the challenge of navigation in large, visually complex environments with sparse rewards. We propose a method that uses object-oriented macro actions grounded in a topological map, allowing a simple Deep Q-Network (DQN) to learn effective navigation policies. The agent builds a map by detecting objects from RGBD input and selecting discrete macro actions that correspond to navigating to these objects. This abstraction drastically reduces the complexity of the underlying reinforcement learning problem and enables generalization to unseen environments. We evaluate our approach in a photorealistic 3D simulation and show that it significantly outperforms a random baseline under both immediate and terminal reward conditions. Our results demonstrate that topological structure and macro-level abstraction can enable sample-efficient learning even from pixel data.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2504.183

Genre: Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction

Luo, Baiting, Pettet, Ava, Laszka, Aron, Dubey, Abhishek, Mukhopadhyay, Ayan

arXiv.org Artificial IntelligenceMar-2-2025

If we were to apply MCTS directly to this abstracted space, we would encounter two main issues: inefficient utilization of our pre-built search space, with the search potentially diverging prematurely into unexplored regions, and difficulty in building sufficiently deep trees for high-quality long-term decision-making, particularly in areas of high stochasticity or uncertainty (Cou etoux et al., 2011). Therefore, we use progressive widening to extend MCTS to incrementally expand the search tree. It balances the exploration of new states with the exploitation of already visited states based on two hyperparameters: α [0, 1] and ϵ R + . Let |C (s, z) | denote the number of children for the state-action pair (s, z) . The key idea is to alternate between adding new child nodes and selecting among existing child nodes, depending on the number of times a state-action pair ( s, z) has been visited. A new state is added to the tree if |C ( s, z)| < ϵ N (s, z) α, where N (s, z) is the number of times the state-action pair has been visited. The hyperparameter α controls the propensity to select among existing children, with α = 0 leading to always selecting among existing child and α = 1 leading to vanilla MCTS behavior (always adding a new child). In this way, we could enhance our approach by efficiently utilizing the pre-built search space, prioritizing the exploration of promising macro actions while allowing for incremental expansion of the search tree. This technique enables our method to make quick decisions in an anytime manner, leveraging the cached information, and further refine the planning tree if additional time is available.

action space, conference paper, latent code, (15 more...)

arXiv.org Artificial Intelligence

2502.21186

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(15 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment > Games (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.48)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.95)

Add feedback

Reviews: Strategic Attentive Writer for Learning Macro-Actions

Neural Information Processing SystemsJan-20-2025, 19:07:04 GMT

Summary of Recommendation: The paper introduces an original idea. Committing to a plan has been introduced before in RL, e.g., in Sutton's options literature (where no learning occurs), and Schmidhuber's hierarchical RL systems of the early 1990s, and Wiering's HQ learning, but the new approach is different. However, the formalisation and experimental section seem to lack clarity and raise several questions. In particular, the experiments don't show very convincingly that the attentional mechanism is needed (although it seems like a very nice idea) and the actual behaviour of the attention is not explored at all. I don't see this as a fatal flaw, but this is definitely problematic since the title and main thrust of the paper rely on it.

learning macro-action, schmidhuber, strategic attentive writer, (6 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

Multi-agent reinforcement learning strategy to maximize the lifetime of Wireless Rechargeable

Nguyen, Bao

arXiv.org Artificial IntelligenceNov-20-2024

The thesis proposes a generalized charging framework for multiple mobile chargers to maximize the network lifetime and ensure target coverage and connectivity in large scale WRSNs. Moreover, a multi-point charging model is leveraged to enhance charging efficiency, where the MC can charge multiple sensors simultaneously at each charging location. The thesis proposes an effective Decentralized Partially Observable Semi-Markov Decision Process (Dec POSMDP) model that promotes Mobile Chargers (MCs) cooperation and detects optimal charging locations based on realtime network information. Furthermore, the proposal allows reinforcement algorithms to be applied to different networks without requiring extensive retraining. To solve the Dec POSMDP model, the thesis proposes an Asynchronous Multi Agent Reinforcement Learning algorithm (AMAPPO) based on the Proximal Policy Optimization algorithm (PPO).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2411.14496

Country:

Asia > Vietnam > Hanoi > Hanoi (0.04)
North America > United States > California > Yolo County > Davis (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Energy (1.00)
Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scaling Long-Horizon Online POMDP Planning via Rapid State Space Sampling

Liang, Yuanchu, Kim, Edward, Thomason, Wil, Kingston, Zachary, Kurniawati, Hanna, Kavraki, Lydia E.

arXiv.org Artificial IntelligenceNov-11-2024

Partially Observable Markov Decision Processes (POMDPs) are a general and principled framework for motion planning under uncertainty. Despite tremendous improvement in the scalability of POMDP solvers, long-horizon POMDPs (e.g., $\geq15$ steps) remain difficult to solve. This paper proposes a new approximate online POMDP solver, called Reference-Based Online POMDP Planning via Rapid State Space Sampling (ROP-RaS3). ROP-RaS3 uses novel extremely fast sampling-based motion planning techniques to sample the state space and generate a diverse set of macro actions online which are then used to bias belief-space sampling and infer high-quality policies without requiring exhaustive enumeration of the action space -- a fundamental constraint for modern online POMDP solvers. ROP-RaS3 is evaluated on various long-horizon POMDPs, including on a problem with a planning horizon of more than 100 steps and a problem with a 15-dimensional state space that requires more than 20 look ahead steps. In all of these problems, ROP-RaS3 substantially outperforms other state-of-the-art methods by up to multiple folds.

artificial intelligence, machine learning, macro action, (15 more...)

arXiv.org Artificial Intelligence

2411.07032

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Chai, Yekun, Sun, Haoran, Fang, Huang, Wang, Shuohuan, Sun, Yu, Wu, Hua

arXiv.org Artificial IntelligenceOct-3-2024

Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences, where delayed rewards make it challenging for the model to discern which actions contributed to successful outcomes. This hinders learning efficiency and slows convergence. In this paper, we propose MA-RLHF, a simple yet effective RLHF framework that incorporates macro actions -- sequences of tokens or higher-level language constructs -- into the learning process. By operating at this higher level of abstraction, our approach reduces the temporal distance between actions and rewards, facilitating faster and more accurate credit assignment. This results in more stable policy gradient estimates and enhances learning efficiency within each episode, all without increasing computational complexity during training or inference. We validate our approach through extensive experiments across various model sizes and tasks, including text summarization, dialogue generation, question answering, and program synthesis. Our method achieves substantial performance improvements over standard RLHF, with performance gains of up to 30% in text summarization and code generation, 18% in dialogue, and 8% in question answering tasks. Notably, our approach reaches parity with vanilla RLHF 1.7x to 2x faster in terms of training time and continues to outperform it with further training. We will make our code and data publicly available at https://github.com/ernie-research/MA-RLHF .

ma-ppo, macro action, vanilla ppo, (12 more...)

arXiv.org Artificial Intelligence

2410.02743

Country:

Europe > Italy (0.14)
Europe > Austria > Vienna (0.14)
North America > Mexico (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback