Appendix A Additional table
–Neural Information Processing Systems
Table 2 presents the numerical results for the ablation study in Section 4.2. The results of our main method in Section 4.1 is reported in column Main. Table 3 provides additional ablation study on several building blocks of our main method. T est denotes the variant of using the estimated reward function as the test function when We see that changing the proposed JSD regularizer in Section 3.2 to the KL-dual-based regularizers Changing the implicit policy to the Gaussian policy generally leads to worse performance. The performance difference is especially significant on the Maze2D and Adroit datasets.
Neural Information Processing Systems
Aug-15-2025, 16:31:35 GMT