AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates

Open in new window