Convergent Policy Optimization for Safe Reinforcement Learning
Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang
–Neural Information Processing Systems
Given ,J ( )andD ( )arethesample (i.e., atrajectory) . Note J ( ) and D ( ) are randomness J ( )andD ( )todenote anda ClearlyweJ( )= E J ( ) andD( )= E D ( ) .
Neural Information Processing Systems
Feb-14-2026, 13:53:57 GMT
- Country:
- North America
- Canada > British Columbia
- United States
- California (0.04)
- Illinois > Cook County
- New Jersey > Mercer County
- Princeton (0.04)
- New York (0.04)
- Pennsylvania (0.04)
- South America > Chile
- North America
- Technology: