Muon: Training and Trade-offs with Latent Attention and MoE