AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Neural Information Processing Systems 

W paradigm e introduce that enhances Autoregressi image ve Retrie generation val Augmentation by autoregressi ( v A ely R-R incorporating AG), a novel knearest neighbor retrievals at the patch level. Unlike prior methods that perform a fix single, ed reference static retrie images, val before AR-RA generation G performs and conte condition xt-aware the retrie entire vals generation at each genon eration step, using prior-generated patches as queries to retrieve and incorporate the evolving most rele generation vant patch-le needs vel while visual avoiding references, limitations enabling (e.g., the o model ver-cop to ying, respond stylisto tic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a tion training-free of model-predicted plug-and-use patches decoding with the strate distrib gy that ution directly of retrie mer v ges ed patches, the distrib and u(2) Feature-Augmentation in Decoding (FAiD), a parameter-efficient fine-tuning method convolution that progressi operations vely and smooths leverages the them features to augment of retriev the ed patches image generation via multi-scale process.