Limits to Depth-Efficiencies of Self-Attention

Neural Information Processing Systems 

Zisserman, 2014, He et al., 2016], depth-efficiency was theoretically supported from a variety of