Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Open in new window