Verifiable Reinforcement Learning via Policy Extraction
Osbert Bastani, Yewen Pu, Armando Solar-Lezama
–Neural Information Processing Systems
Trajectoriestakenby , left : s 7! left, and right : s 7! rightareshownas dashededges, rededges, andgreenedges, respectively. Let ={ left : s 7! left, right : s 7! right}, andletg( )= Es d( )[g(s, )]bethe 0-1 loss.
Neural Information Processing Systems
Feb-15-2026, 01:51:09 GMT
- Country:
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.05)
- Canada > Quebec
- North America
- Technology: