A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence Carlo Alfano Department of Statistics University of Oxford
–Neural Information Processing Systems
In this work, we introduce a framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map.
Neural Information Processing Systems
Feb-12-2026, 16:30:57 GMT
- Country:
- Asia
- Middle East > Jordan (0.04)
- Russia (0.04)
- Europe
- France (0.04)
- Russia (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.50)
- North America > United States
- Illinois > Cook County > Chicago (0.04)
- Asia
- Genre:
- Research Report (0.67)
- Technology: