Goto

Collaborating Authors

 erf


Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks

Neural Information Processing Systems

Spiking Neural Networks (SNNs) demonstrate significant potential for energyefficient neuromorphic computing through an event-driven paradigm. While training methods and computational models have greatly advanced, SNNs struggle to achieve competitive performance in visual long-sequence modeling tasks. In artificial neural networks, the effective receptive field (ERF) serves as a valuable tool for analyzing feature extraction capabilities in visual long-sequence modeling. Inspired by this, we introduce the Spatio-Temporal Effective Receptive Field (ST-ERF) to analyze the ERF distributions across various Transformer-based SNNs. Based on the proposed ST-ERF, we reveal that these models suffer from establishing a robust global ST-ERF, thereby limiting their visual feature modeling capabilities. To overcome this issue, we propose two novel channel-mixer architectures: multilayer-perceptron-based mixer (MLPixer) and splash-and-reconstruct block (SRB). These architectures enhance global spatial ERF through all timesteps in early network stages of Transformer-based SNNs, improving performance on challenging visual long-sequence modeling tasks. Extensive experiments conducted on the Meta-SDT variants and across object detection and semantic segmentation tasks further validate the effectiveness of our proposed method. Beyond these specific applications, we believe the proposed ST-ERF framework can provide valuable insights for designing and optimizing SNN architectures across a broader range of tasks.


f5ccb3ab757131a93586ef61ec701533-Supplemental-Conference.pdf

Neural Information Processing Systems

In this section, we compare the symmetric solutions found in erf [2] and ReLU networks [5] to our one-neuron solution (n =1). The main difference is that both earlier studies constrain the search space to the symmetric subspace whereas we first prove that the non-trivial critical points are contained in this subspace in Theorem 5.1 for a broad class of activation functions, including erf and ReLU. Solving the low-dimensional loss, we recover the same solution for ReLU and erf as in [2, 5] for unit-orthonormal teachers.


Should Under-parameterized Student Networks Copy or Average Teacher Weights?

Neural Information Processing Systems

Any continuous function f can be approximated arbitrarily well by a neural network with sufficiently many neurons k. We consider the case when f itself is a neural network with one hidden layer and k neurons. Approximating f with a neural network with n < k neurons can thus be seen as fitting an under-parameterized "student" network with nneurons to a "teacher" network with k neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the n student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that "copy-average" configurations are critical points if the teacher's incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when n 1student neurons each copy one teacher neuron and the n-th student neuron averages the remaining k n+1 teacher neurons. For the student network with n = 1 neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.








cd10c7f376188a4a2ca3e8fea2c03aeb-Paper.pdf

Neural Information Processing Systems

Global information is essential for dense prediction problems, whose goal is to compute adiscrete or continuous label for each pixel in the images. Traditional convolutional layers in neural networks, initially designed for image classification, are restrictive in these problems since the filter size limits their receptive fields. In this work, we propose to replace any traditional convolutional layer with an autoregressivemoving-average (ARMA) layer,anovelmodule with an adjustable receptive field controlled by the learnable autoregressive coefficients.