Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression

Neural Information Processing Systems 

Visual Autoregressive (VAR) modeling has garnered significant attention for its innovative next-scale prediction approach, which yields substantial improvements in efficiency, scalability, and zero-shot generalization.