Gated Linear Attention Transformers with Hardware-Efficient Training

Open in new window