AITopics | movq

MoVQ: Modulating QuantizedVectorsforHigh-FidelityImage Generation ADiscussiononMaskedImageReconstruction

Neural Information Processing SystemsFeb-10-2026, 20:45:29 GMT

Inothercolumns, werandomly masksome tokens (first row), and we sample the invisible tokens based on the visible tokens for the second stage. Here, we show top-1 results in 1 step (second row), and random results in 8 steps (third row),respectively. Interestingly, our model with 95% masked tokens (i.e., 12 tokens are visible among 256 tokens in each channel) is able to generate pluralistic images in only one step by selecting the top 1 token. More importantly, the corresponding results reflect identity attributes of original unmaskedinputs. When the tokens are totally masked (i.e., 100% mask ratio), the model generates plausible and diversity results byrandomly sampling tokens inmultiple steps.

artificial intelligence, movq, thisisanextensionoffig, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Neural Information Processing SystemsDec-24-2025, 20:07:16 GMT

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

high-fidelity image generation, modulating quantized vector, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation A Discussion on Masked Image Reconstruction

Neural Information Processing SystemsAug-17-2025, 02:07:48 GMT

This is an extension of Fig.

artificial intelligence, image generation, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Neural Information Processing SystemsAug-17-2025, 02:07:43 GMT

The vision community has rapidly improved image synthesis results on quality, diversity and resolution over a short period of time.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Neural Information Processing SystemsJan-17-2025, 23:00:11 GMT

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.

high-fidelity image generation, modulating quantized vector, movq

Neural Information Processing Systems

Technology: