Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

Neural Information Processing Systems 

To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found