AdaPM: a Partial Momentum Algorithm for LLM Training

Open in new window