M2T: Masking Transformers Twice for Faster Decoding

Mentzer, Fabian, Agustsson, Eirikur, Tschannen, Michael

Apr-14-2023–arXiv.org Artificial Intelligence

In MaskGIT [11], the authors (see Figure 1) use a VQ-GAN [16] to map images to vector-quantized tokens, Motivated by this, we aim to employ masked transformers and learn a transformer to predict the distribution of for neural image compression. Previous work has these tokens. The key novelty of the approach was to use used masked and unmasked transformers in the entropy BERT-like [13] random masks during training to then predict model for video compression [37, 25] and image compression tokens in groups during inference, sampling tokens in [29, 22, 15]. However, these models are often either the same group in parallel at each inference step. Thereby, prohibitively slow [22], or lag in rate-distortion performance each inference step is conditioned on the tokens generated [29, 15]. In this paper, we show a conceptually in previous steps. A big advantage of BERT-like training simple transformer-based approach that is state-of-the-art in with grouped inference versus prior state-of-the-art is that neural image compression, at practical runtimes. The model considerably fewer steps are required to produce realistic is using off-the-shelf transformers, and does not rely on images (typically 10-20, rather than one per token).

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Artificial Intelligence

Apr-14-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report > Promising Solution (0.46)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (0.66)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found