Backpropagation
Reviews: Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks
The authors propose a variant of the backpropagation through time (BPTT) algorithm for spiking neural networks (SNNs). An interesting aspect is that, instead of unrolling the network computation over time, backpropagation over spike trains is performed. The algorithm is tested on various datasets, achieving state-of-the-art results for SNNs. The approach is very original and innovative. The results are very good and of interest for the community interested in spiking neural networks.
HKAN: Hierarchical Kolmogorov-Arnold Network without Backpropagation
Dudek, Grzegorz, Rodak, Tomasz
This paper introduces the Hierarchical Kolmogorov-Arnold Network (HKAN), a novel network architecture that offers a competitive alternative to the recently proposed Kolmogorov-Arnold Network (KAN). Unlike KAN, which relies on backpropagation, HKAN adopts a randomized learning approach, where the parameters of its basis functions are fixed, and linear aggregations are optimized using least-squares regression. HKAN utilizes a hierarchical multi-stacking framework, with each layer refining the predictions from the previous one by solving a series of linear regression problems. This non-iterative training method simplifies computation and eliminates sensitivity to local minima in the loss function. Empirical results show that HKAN delivers comparable, if not superior, accuracy and stability relative to KAN across various regression tasks, while also providing insights into variable importance. The proposed approach seamlessly integrates theoretical insights with practical applications, presenting a robust and efficient alternative for neural network modeling.
Reviews: A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation
The paper proposes a method that reduces memory consumption of back prop, and as a result allows the use of larger batch sizes. The authors state that this is significant e.g. with batch norm, where batch size matters. In my own experience, I believe this is also significant for improved GPU utilization and data-parallel training. The paper is original in that I have not seen a similar treatment (although the actual solution may have been relatively straight-forward; asking the right question was an important part of this). The paper is well written and easy to follow.
Reviews: A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation
This work formalizes the problem of minimizing memory consumption through recomputation when performing a forward-backprop evaluation of a computation graph; it provides an optimal dynamic programming algorithm and an efficient heuristic and demonstrates strong improvements in memory savings over existing methods. Three expert reviewers initially assess the paper as 8/8/5, and the authors provided a detailed rebuttal. All reviewers took part in a discussion and the final assessment is 8/8/6, with reviewer consensus that this method is practically useful and the reported gains are strong. Overall this work is a nice contribution.
Review for NeurIPS paper: Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks
Weaknesses: Some points that need clarification: 1)The authors argue that their approach is better because it allows neural networks to respond within very few timesteps. The results are reported after 5 timesteps. I am not sure if for such very fast respons, the dynamics of the network are playing any role in this case. For example, training in MNIST requires the correct output neuron to emit a spike in the second time step. It seems there is just one volley of activity passing instantaneously throughout the network.
Review for NeurIPS paper: GAIT-prop: A biologically plausible learning rule derived from backpropagation of error
I believe this paper makes a meaningful contribution to this line of work and have changed my score accordingly to support acceptance. I do have a few comments that I hope you will consider as you prepare a final version of this paper, mainly coming from a neuroscience perspective. While the method described in this paper advances the family of target prop-related models and may serve as a foundation for future work in bio-plausible learning models, I don't think it is appropriate to describe it as more biologically plausible than backpropagation. One of the commonly cited biologically implausible features of backpropagation (weight symmetry) is replaced here by an equally implausible mechanism (perfect inverse models). It is true that bio-plausible ways of approximating inverses may exist, but there are also proposals for bio-plausible ways of maintaining weight symmetry (e.g.
Review for NeurIPS paper: GAIT-prop: A biologically plausible learning rule derived from backpropagation of error
This paper presents a biologically plausible learning rule as an alternative to standard back-propagation. This is a heavily studied area in ML, with strong interest from both the ML and computational neuroscience communities. The reviewers agreed that this work presents an exciting and important contribution over the existing literature on this problem. There was extensive discussion between reviewers, with two reviewers championing the paper for acceptance. The lower scoring reviewers cited the empirical evaluation as a weakness of the paper, while others argued that the idea on its own was sufficiently interesting to the community.
Reviews: Backpropagation-Friendly Eigendecomposition
This is a point of confusion throughout the paper. In line 57, you start with given an input matrix M'. Listing the SVD / QR as a method to compute the the ED is not correct here. Later you clarify that M is considered to be a covariance matrix. You should clarify at the very beginning that the scope of your method is limited and restricted to symmetric matrices!
Reviews: Backpropagation-Friendly Eigendecomposition
There is mixed feedback in the reviews on the significance of the paper. In my view it is a fundamental building block in deep learning to be able to include modules that perform eigendecompositions in a stable and differentiable (via backprop) manner. While the idea of the paper is simple, i.e. combine the advantages of SVD and power iterations by using them in the forward and backward pass, respectively, there is real insight (see stability bounds) and solid empirical evidence (100% stability even in relatively high dimensions). I think this clarity - simple, but important idea, well analyzed - is a feature and makes this paper a valuable contribution.