A.1 Hyper-Parameters For all datasets, the surrogate gradient function isσ(x) = 1π arctan(π2αx) + 12, thus σ0(x) = α 2(1+(π