thefollowinginequalityh...
Appendices
In detail, we choose UMAP [15] as the projection algorithm and train the projecting function in Hopper using 64000 transitions sampled by the expert agent. To evaluate a policy, we sample the same number of transitions, and then project them onto a 2-dimensional space by the trained projectingfunction. For empirical estimation, we subsequently discretize the projected 2-dimensional state space into small grid regions, and estimated the distribution via Kernel Density Estimation (KDE) [19]with Gaussian kernel. These twohyperparameters affect the experimental results more significantly. Moreover, as mentioned in Section 6.3, they can be tuned based onthedistribution ofthedataset.
Country:
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)