Knop, Szymon
LocoGAN -- Locally Convolutional GAN
Struski, Łukasz, Knop, Szymon, Tabor, Jacek, Daniec, Wiktor, Spurek, Przemysław
We add extra channels with spatial information to the input noise images. In the paper we construct a fully convolutional GAN model: LocoGAN, which latent space is Such architecture and design of latent space allows us to given by noise-like images of possibly different use an input of various dimensions. We use that to train our resolutions. The learning is local, i.e. we process model only on parts of the latent image, see Figure 1. We call not the whole noise-like image, but the subimages this approach local learning. Section 3 contains the detailed of a fixed size.
Target Layer Regularization for Continual Learning Using Cramer-Wold Generator
Mazur, Marcin, Pustelnik, Łukasz, Knop, Szymon, Pagacz, Patryk, Spurek, Przemysław
The concept of continual learning (CL), which aims to reduce the distance between human and artificial intelligence, seems to be considered recently by deep learning community as one of the main challenges. Generally speaking, it means the ability of the neural network to effectively learn consecutive tasks (in either supervised or unsupervised scenarios) while trying to prevent forgetting already learned information. Therefore, when designing an appropriate strategy, it needs to be ensured that the network weights are updated in such a way that they correspond to both the current and all previous tasks. However, in practice, it is quite likely that constructed CL model will suffer from either intransigence (hard acquiring new knowledge, see Chaudhry et al. [2018]) or catastrophic forgetting (CF) phenomenon (tendency to lose past knowledge, see McCloskey and Cohen [1989]). In recent years, methods of overcoming the above-mentioned problems are subject to wide and intensive investigation.
Generative models with kernel distance in data space
Knop, Szymon, Mazur, Marcin, Spurek, Przemysław, Tabor, Jacek, Podolak, Igor
Generative models dealing with modeling a~joint data distribution are generally either autoencoder or GAN based. Both have their pros and cons, generating blurry images or being unstable in training or prone to mode collapse phenomenon, respectively. The objective of this paper is to construct a~model situated between above architectures, one that does not inherit their main weaknesses. The proposed LCW generator (Latent Cramer-Wold generator) resembles a classical GAN in transforming Gaussian noise into data space. What is of utmost importance, instead of a~discriminator, LCW generator uses kernel distance. No adversarial training is utilized, hence the name generator. It is trained in two phases. First, an autoencoder based architecture, using kernel measures, is built to model a manifold of data. We propose a Latent Trick mapping a Gaussian to latent in order to get the final model. This results in very competitive FID values.
One-element Batch Training by Moving Window
Spurek, Przemysław, Knop, Szymon, Tabor, Jacek, Podolak, Igor, Wójcik, Bartosz
Several deep models, esp. the generative, compare the samples from two distributions (e.g. WAE like AutoEncoder models, set-processing deep networks, etc) in their cost functions. Using all these methods one cannot train the model directly taking small size (in extreme -- one element) batches, due to the fact that samples are to be compared. We propose a generic approach to training such models using one-element mini-batches. The idea is based on splitting the batch in latent into parts: previous, i.e. historical, elements used for latent space distribution matching and the current ones, used both for latent distribution computation and the minimization process. Due to the smaller memory requirements, this allows to train networks on higher resolution images then in the classical approach.
Sliced generative models
Knop, Szymon, Mazur, Marcin, Tabor, Jacek, Podolak, Igor, Spurek, Przemysław
In this paper we discuss a class of AutoEncoder based generative models based on one dimensional sliced approach. The idea is based on the reduction of the discrimination between samples to one-dimensional case. Our experiments show that methods can be divided into two groups. First consists of methods which are a modification of standard normality tests, while the second is based on classical distances between samples. It turns out that both groups are correct generative models, but the second one gives a slightly faster decrease rate of Fr\'{e}chet Inception Distance (FID).
Cramer-Wold AutoEncoder
Tabor, Jacek, Knop, Szymon, Spurek, Przemysław, Podolak, Igor, Mazur, Marcin, Jastrzębski, Stanisław
We propose a new generative model, Cramer-Wold Autoencoder (CWAE). Following WAE, we directly encourage normality of the latent space. Our paper uses also the recent idea from Sliced WAE (SWAE) model, which uses one-dimensional projections as a method of verifying closeness of two distributions. The crucial new ingredient is the introduction of a new (Cramer-Wold) metric in the space of densities, which replaces the Wasserstein metric used in SWAE. We show that the Cramer-Wold metric between Gaussian mixtures is given by a simple analytic formula, which results in the removal of sampling necessary to estimate the cost function in WAE and SWAE models. As a consequence, while drastically simplifying the optimization procedure, CWAE produces samples of a matching perceptual quality to other SOTA models.