AITopics | memory augmented policy optimization

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Neural Information Processing SystemsMar-17-2026, 02:07:23 GMT

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards. We evaluate MAPO on weakly supervised program synthesis from natural language (semantic parsing). On the WikiTableQuestions benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our source code is available at https://goo.gl/TXBp4e

artificial intelligence, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.63)
Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Neural Information Processing SystemsNov-20-2025, 23:13:51 GMT

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards. We evaluate MAPO on weakly supervised program synthesis from natural language (semantic parsing). On the WikiTableQuestions benchmark, we improve the state-of-the-art by 2.6%, achieving an accuracy of 46.3%. On the WikiSQL benchmark, MAPO achieves an accuracy of 74.9% with only weak supervision, outperforming several strong baselines with full supervision. Our source code is available at https://goo.gl/TXBp4e

artificial intelligence, memory augmented policy optimization, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Reviews: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Neural Information Processing SystemsOct-9-2024, 04:09:38 GMT

This paper describes a Reinforcement Learning algorithm adapted to settings with sparse reward and weak supervision, and applies it to program synthesis, achieving state-of-the-art and even outperforming baselines with full supervision. The two first sections explain very clearly the motivation of this work, presenting the current limitations of reinforcement learning for tasks like contextual program synthesis. It is nicely written and pleasant to read. Section 3 presents the Reinforcement Learning framework that is the basis of the proposal, where the goal is to find a food approximation of the expected return objective. Section 4 presents the MAPO algorithm and his three key points: "(1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover new high reward trajectories" (I did not find a better phrasing to summarize than the one in the abstract and the conclusion).

memory augmented policy optimization, program synthesis and semantic parsing, trajectory, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.86)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.85)

Add feedback

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Neural Information Processing SystemsOct-8-2024, 20:08:53 GMT

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards.

memory augmented policy optimization, program synthesis and semantic parsing, trajectory, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.78)

Add feedback

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Liang, Chen, Norouzi, Mohammad, Berant, Jonathan, Le, Quoc V., Lao, Ni

Neural Information Processing SystemsFeb-14-2020, 20:57:12 GMT

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with discrete actions, such as structured prediction and combinatorial optimization tasks. We express the expected return objective as a weighted sum of two terms: an expectation over the high-reward trajectories inside the memory buffer, and a separate expectation over trajectories outside the buffer. To make an efficient algorithm of MAPO, we propose: (1) memory weight clipping to accelerate and stabilize training; (2) systematic exploration to discover high-reward trajectories; (3) distributed sampling from inside and outside of the memory buffer to scale up training. MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards.

memory augmented policy optimization, program synthesis and semantic parsing, trajectory, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.79)
Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Filters

Collaborating Authors

memory augmented policy optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Reviews: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing