Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models Dongwon Jo1 Taesu Kim 2 Yulhwa Kim 3

Open in new window