Anchor-Changing Regularized Natural Policy Gradientfor Multi-Objective Reinforcement Learning
–Neural Information Processing Systems
Let = betheoptimalpolicyofthe CMDPproblemin (9). Theorem 3.ForanyK 1, takeuniformpolicy 0, 0 16 , 6 (1 )3, = 1 , and tk =d 11 log (5LK6 log (|A|))+1 e.
Neural Information Processing Systems
Feb-9-2026, 03:44:36 GMT
- Country:
- North America > United States > Texas > Brazos County > College Station (0.05)
- Industry:
- Technology: