Reviews: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

Oct-9-2024, 04:09:38 GMT–Neural Information Processing Systems

This paper describes a Reinforcement Learning algorithm adapted to settings with sparse reward and weak supervision, and applies it to program synthesis, achieving state-of-the-art and even outperforming baselines with full supervision. The two first sections explain very clearly the motivation of this work, presenting the current limitations of reinforcement learning for tasks like contextual program synthesis. It is nicely written and pleasant to read. Section 3 presents the Reinforcement Learning framework that is the basis of the proposal, where the goal is to find a food approximation of the expected return objective. Section 4 presents the MAPO algorithm and his three key points: "(1) distributed sampling from inside and outside memory with an actor-learner architecture; (2) a marginal likelihood constraint over the memory to accelerate training; (3) systematic exploration to discover new high reward trajectories" (I did not find a better phrasing to summarize than the one in the abstract and the conclusion).

memory augmented policy optimization, program synthesis and semantic parsing, trajectory, (5 more...)

Neural Information Processing Systems

Oct-9-2024, 04:09:38 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Logic & Formal Reasoning (0.86)
  - Natural Language > Grammars & Parsing (0.85)