Forgetting Transformer: Softmax Attention with a Forget Gate

Open in new window