VeLoRA: MemoryEfficientTrainingusing Rank-1Sub-TokenProjections

Neural Information Processing Systems 

Using a single projection vector, we then project these individual sub-tokens onto a one-dimensional subspace. Importantly, we notice that we can initialize this projection vector cheaply using first-order batch statistics andthen keepitfixedthroughout training. Wethen reconstruct the original tokens using the same vector during the backward pass.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found