AITopics

2309.15505

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceMay-26-2023

High-Fidelity Image Compression with Score-based Generative Models

Hoogeboom, Emiel, Agustsson, Eirikur, Mentzer, Fabian, Versari, Luca, Toderici, George, Theis, Lucas

Despite the tremendous success of diffusion generative models in text-to-image generation, replicating this success in the domain of image compression has proven difficult. In this paper, we demonstrate that diffusion can significantly improve perceptual quality at a given bit-rate, outperforming state-of-the-art approaches PO-ELIC and HiFiC as measured by FID score. This is achieved using a simple but theoretically motivated two-stage approach combining an autoencoder targeting MSE followed by a further score-based decoder. However, as we will show, implementation details matter and the optimal design decisions can differ greatly from typical text-to-image models.

artificial intelligence, diffusion model, machine learning, (18 more...)

2305.18231

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceApr-14-2023

M2T: Masking Transformers Twice for Faster Decoding

Mentzer, Fabian, Agustsson, Eirikur, Tschannen, Michael

In MaskGIT [11], the authors (see Figure 1) use a VQ-GAN [16] to map images to vector-quantized tokens, Motivated by this, we aim to employ masked transformers and learn a transformer to predict the distribution of for neural image compression. Previous work has these tokens. The key novelty of the approach was to use used masked and unmasked transformers in the entropy BERT-like [13] random masks during training to then predict model for video compression [37, 25] and image compression tokens in groups during inference, sampling tokens in [29, 22, 15]. However, these models are often either the same group in parallel at each inference step. Thereby, prohibitively slow [22], or lag in rate-distortion performance each inference step is conditioned on the tokens generated [29, 15]. In this paper, we show a conceptually in previous steps. A big advantage of BERT-like training simple transformer-based approach that is state-of-the-art in with grouped inference versus prior state-of-the-art is that neural image compression, at practical runtimes. The model considerably fewer steps are required to produce realistic is using off-the-shelf transformers, and does not rely on images (typically 10-20, rather than one per token).

artificial intelligence, machine learning, transformer, (16 more...)

2304.07313

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceMar-30-2023

Multi-Realism Image Compression with a Conditional Generator

Agustsson, Eirikur, Minnen, David, Toderici, George, Mentzer, Fabian

By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.

artificial intelligence, machine learning, reconstruction, (17 more...)

2212.13824

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningFeb-18-2021

On the advantages of stochastic encoders

Theis, Lucas, Agustsson, Eirikur

Stochastic encoders have been used in rate-distortion theory and neural compression because they can be easier to handle. However, in performance comparisons with deterministic encoders they often do worse, suggesting that noise in the encoding process may generally be a bad idea. It is poorly understood if and when stochastic encoders do better than deterministic encoders. In this paper we provide one illustrative example which shows that stochastic encoders can significantly outperform the best deterministic encoders. Our toy example suggests that stochastic encoders may be particularly useful in the regime of "perfect perceptual quality".

artificial intelligence, encoder, machine learning, (17 more...)

2102.0927

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

arXiv.org Machine LearningOct-21-2020

Universally Quantized Neural Compression

Agustsson, Eirikur, Theis, Lucas

A popular approach to learning encoders for lossy compression is to use additive uniform noise during training as a differentiable approximation to test-time quantization. We demonstrate that a uniform noise channel can also be implemented at test time using universal quantization (Ziv, 1985). This allows us to eliminate the mismatch between training and test phases while maintaining a completely differentiable loss function. Implementing the uniform noise channel is a special case of the more general problem of communicating a sample, which we prove is computationally hard if we do not make assumptions about its distribution. However, the uniform special case is efficient as well as easy to implement and thus of great interest from a practical point of view. Finally, we show that quantization can be obtained as a limiting case of a soft quantizer applied to the uniform noise channel, bridging compression with and without quantization.

deep learning, neural network, quantization, (17 more...)

2006.09952

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Neural Information Processing SystemsDec-31-2018

Deep Generative Models for Distribution-Preserving Lossy Compression

Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario

We propose and study the problem of distribution-preserving lossy compression. Motivated by recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. The resulting compression system recovers both ends of the spectrum: On one hand, at zero bitrate it learns a generative model of the data, and at high enough bitrates it achieves perfect reconstruction. Furthermore, for intermediate bitrates it smoothly interpolates between learning a generative model of the training data and perfectly reconstructing the training samples. We study several methods to approximately solve the proposed optimization problem, including a novel combination of Wasserstein GAN and Wasserstein Autoencoder, and present an extensive theoretical and empirical characterization of the proposed compression systems.

deep learning, neural network, wasserstein, (21 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Neural Information Processing SystemsDec-31-2018

Deep Generative Models for Distribution-Preserving Lossy Compression

Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario

deep learning, neural network, wasserstein, (21 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

arXiv.org Machine LearningMay-28-2018

Deep Generative Models for Distribution-Preserving Lossy Compression

Tschannen, Michael, Agustsson, Eirikur, Lucic, Mario

We propose and study the problem of distribution-preserving lossy compression. Motivated by the recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. Such a compression system recovers both ends of the spectrum: On one hand, at zero bitrate it learns a generative model of the data, and at high enough bitrates it achieves perfect reconstruction. Furthermore, for intermediate bitrates it smoothly interpolates between matching the distribution of the training data and perfectly reconstructing the training samples. We study several methods to approximately solve the proposed optimization problem, including a novel combination of Wasserstein GAN and Wasserstein Autoencoder, and present strong theoretical and empirical results for the proposed compression system.

deep learning, neural network, reconstruction, (22 more...)

1805.11057

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

arXiv.org Machine LearningJan-24-2018

Optimal transport maps for distribution preserving operations on latent spaces of Generative Models

Agustsson, Eirikur, Sage, Alexander, Timofte, Radu, Van Gool, Luc

Generative models such as Variational Auto Encoders (VAEs) and Generative Adversarial Networks (GANs) are typically trained for a fixed prior distribution in the latent space, such as uniform or Gaussian. After a trained model is obtained, one can sample the Generator in various forms for exploration and understanding, such as interpolating between two samples, sampling in the vicinity of a sample or exploring differences between a pair of samples applied to a third sample. In this paper, we show that the latent space operations used in the literature so far induce a distribution mismatch between the resulting outputs and the prior distribution the model was trained on. To address this, we propose to use distribution matching transport maps to ensure that such latent space operations preserve the prior distribution, while minimally modifying the original operation. Our experimental results validate that the proposed operations give higher quality samples compared to the original operations.

artificial intelligence, interpolation, neural network, (17 more...)

1711.0197

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)