The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

Neural Information Processing Systems 

To train an LLM, one needs to alternatingly run'forward' computations and'backward' computations.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found