Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition

Open in new window