Combiner: Full Attention Transformer with Sparse Computation Cost

Open in new window