df42e2244c97a0d80d565ae8176d3351-Supplemental.pdf

Neural Information Processing Systems 

Freeway is excluded from this table as Junyent et al. [ Epochs 8 Loss Function for Policy Categorical crossentropy Loss Function for Value Function Huber Discount factor used in TD Learning 0.99 Time steps between target network updates (for value network) 10,000 Interval size of learning schedule Due to computational restraints we could not tune the hyperparameters of N-CPL.