DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Open in new window