Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Open in new window