Review for NeurIPS paper: Untangling tradeoffs between recurrence and self-attention in artificial neural networks

Feb-7-2025, 08:20:08 GMT–Neural Information Processing Systems

Additional Feedback: - Line 145, how can Theorem 1 be related to the early attention mechanism [1]? As the attention weights are computed adaptively, it is unlikely that they are uniform. MANNs learn to store relevant hidden states to a fixed-size memory, which seems to have the same purpose as relevancy screening mechanism. What is the advantage of the proposed method over MANNs? How are MANNs related to the Theorem 2? - The paper neglects prior works that also aim to quantify gradient propagation in RNNs and attentive models [4,5].

neural network, recurrence and self-attention, relevancy screening mechanism, (11 more...)

Neural Information Processing Systems

Feb-7-2025, 08:20:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)