The Potential of Second-Order Optimization for LLMs: A Study with Full Gauss-Newton

Open in new window