GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Open in new window