Policy Optimization for Markov Games: Unified Framework and Faster Convergence Runyu Zhang Harvard University
–Neural Information Processing Systems
Policy optimization, i.e. algorithms that learn to make sequential decisions by local search on the agent's policy directly, is a widely used class of algorithms in reinforcement learning [
Neural Information Processing Systems
Aug-16-2025, 18:01:34 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.46)
- Technology: