2x Faster Language Model Pre-training via Masked Structural Growth

Open in new window