Review for NeurIPS paper: Untangling tradeoffs between recurrence and self-attention in artificial neural networks

Neural Information Processing Systems 

The paper provides theoretical analysis of self-attention and vanishing gradients. Experiments are of toy problems with non-SOTA results but validate the main theoretical contributions of the paper.