Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
–Neural Information Processing Systems
To address this issue, we introduce a novel binarization technique called Mixture of Scales (BinaryMoS).
Neural Information Processing Systems
Oct-10-2025, 21:55:01 GMT
- Country:
- Asia > South Korea > Seoul > Seoul (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.67)
- Technology: