Goto

Collaborating Authors

 kgk



SupplementaryMaterials AExperiment

Neural Information Processing Systems

We adopt neural softmax policy with two hidden layers of the size (128, 128). R a A1da=CA<, 2. πw is the Gaussian policy, i.e.,πw(s) = N(f(w),σ2), with f(w) being Lf-Lipschitz (0