A Appendix 655 A.1 Learning Curves
–Neural Information Processing Systems
Table 6: Hyper-parameters for SAC (on Atari) Total steps 1,000,000 Replay buffer size 100,000 0.99 Learning start 80,000 Actor train frequency 4 Critic train frequency 4 Target network update frequency 8,000 Actor Learning rate 3 10
Neural Information Processing Systems
Feb-8-2026, 20:27:26 GMT