VeLoRA: MemoryEfficientTrainingusing Rank-1Sub-TokenProjections
–Neural Information Processing Systems
Using a single projection vector, we then project these individual sub-tokens onto a one-dimensional subspace. Importantly, we notice that we can initialize this projection vector cheaply using first-order batch statistics andthen keepitfixedthroughout training. Wethen reconstruct the original tokens using the same vector during the backward pass.
Neural Information Processing Systems
Feb-12-2026, 13:46:59 GMT