Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

Open in new window