Verifiable Reinforcement Learning via Policy Extraction

Osbert Bastani, Yewen Pu, Armando Solar-Lezama

Neural Information Processing Systems 

Trajectoriestakenby , left : s 7! left, and right : s 7! rightareshownas dashededges, rededges, andgreenedges, respectively. Let ={ left : s 7! left, right : s 7! right}, andletg( )= Es d( )[g(s, )]bethe 0-1 loss.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found