Goto

Collaborating Authors

 spiking resnet


A.1 Hyper-Parameters For all datasets, the surrogate gradient function isσ(x) = 1π arctan(π2αx) + 12, thus σ0(x) = α 2(1+(π

Neural Information Processing Systems

A.1 Hyper-Parameters For all datasets, the surrogate gradient function isσ(x) = 1π arctan(π2αx) + 12, thus σ0(x) = The results on the three networks are consistent, indicating that RTD is a general sequential data augmentationmethod. We compare different surrogate functions, including Rectangular (σ0(x) = sign(|x| < 12)),ArcTan(σ0(x) = 11+(πx)2)and Constant 1(σ0(x) 1),intheSNNs on CIFAR-10. The results are shown in Tab.9. Tab.9 indicates that the choice of surrogate function has a considerable influence on the SNN's performance. Although Rectangular and Constant 1 can avoid the gradient exploding/vanishing problems in Eq.(8), they still cause lower accuracy or even make the optimization not converges.





Training Full Spike Neural Networks via Auxiliary Accumulation Pathway

Chen, Guangyao, Peng, Peixi, Li, Guoqi, Tian, Yonghong

arXiv.org Artificial Intelligence

Due to the binary spike signals making converting the traditional high-power multiply-accumulation (MAC) into a low-power accumulation (AC) available, the brain-inspired Spiking Neural Networks (SNNs) are gaining more and more attention. However, the binary spike propagation of the Full-Spike Neural Networks (FSNN) with limited time steps is prone to significant information loss. To improve performance, several state-of-the-art SNN models trained from scratch inevitably bring many non-spike operations. The non-spike operations cause additional computational consumption and may not be deployed on some neuromorphic hardware where only spike operation is allowed. To train a large-scale FSNN with high performance, this paper proposes a novel Dual-Stream Training (DST) method which adds a detachable Auxiliary Accumulation Pathway (AAP) to the full spiking residual networks. The accumulation in AAP could compensate for the information loss during the forward and backward of full spike propagation, and facilitate the training of the FSNN. In the test phase, the AAP could be removed and only the FSNN remained. This not only keeps the lower energy consumption but also makes our model easy to deploy. Moreover, for some cases where the non-spike operations are available, the APP could also be retained in test inference and improve feature discrimination by introducing a little non-spike consumption. Extensive experiments on ImageNet, DVS Gesture, and CIFAR10-DVS datasets demonstrate the effectiveness of DST.