Approximating Two-Layer Feedforward Networks for Efficient Transformers

Open in new window