Appendix A Fine-Tuning Result for Sanity Check 17 B Training Details 17 C Model Architecture and Module Names 18 D Activation Similarity 20 D.1 Details of Experiments
–Neural Information Processing Systems
We compare our performance results to those of previous studies to ensure that they are not far off from the results of the prior study [9], the result of which is shown in Table 1. Mean return is the sum of rewards averaged over trajectory. For further details of the datasets and metrics, please refer to the paper that proposes D4RL [38]. Although the previous work [9] used several techniques to improve the performance, e.g. The training details are described in Appendix B The result for the previous work is the average and standard deviation of three random seeds, while our result is those of two random seeds. The aim of this comparison is just to confirm that our result is not too pathological, checking soundness with two seeds would be valid enough. For reference, we also include the results of the Decision Transformer (DT) since the randomly initialized model (Random Init) is a large Decision Transformer.
Neural Information Processing Systems
Feb-10-2025, 09:29:21 GMT
- Genre:
- Research Report > New Finding (0.68)