Early Convolutions Help Transformers See Better

Open in new window