Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing
Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V. Le, Ni Lao
–Neural Information Processing Systems
MAPO improves the sample efficiency and robustness of policy gradient, especially on tasks with sparse rewards.
Neural Information Processing Systems
Nov-20-2025, 21:14:06 GMT