Policy Optimization for Markov Games: Unified Framework and Faster Convergence Runyu Zhang Harvard University
–Neural Information Processing Systems
Policy optimization, i.e. algorithms that learn to make sequential decisions by local search on the agent's policy directly, is a widely used class of algorithms in reinforcement learning [
Neural Information Processing Systems
Aug-16-2025, 18:01:34 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.46)
- Technology: