Unified
–Neural Information Processing Systems
Policy optimization, i.e. algorithms that learn to make sequential decisions by local search on the agent's policy directly, is a widely used class of algorithms in reinforcement learning [40, 44, 45].
Neural Information Processing Systems
Feb-19-2026, 07:24:31 GMT