Appendices
–Neural Information Processing Systems
In detail, we choose UMAP [15] as the projection algorithm and train the projecting function in Hopper using 64000 transitions sampled by the expert agent. To evaluate a policy, we sample the same number of transitions, and then project them onto a 2-dimensional space by the trained projectingfunction. For empirical estimation, we subsequently discretize the projected 2-dimensional state space into small grid regions, and estimated the distribution via Kernel Density Estimation (KDE) [19]with Gaussian kernel. These twohyperparameters affect the experimental results more significantly. Moreover, as mentioned in Section 6.3, they can be tuned based onthedistribution ofthedataset.
Neural Information Processing Systems
Feb-8-2026, 03:34:38 GMT
- Technology: