Learning 1D Causal Visual Representation with De-focus Attention Networks Chenxin Tao

Neural Information Processing Systems 

While images typically require 2D non-causal modeling, texts utilize 1D causal modeling.