The transition dynamics simply mixes an action and a random sampled latent. It then applies an exponential moving average for temporal persistency, the resulting latent is decoded to image using pretrained generator.
Although various normalization schemes havebeen proposed, theyallfollowacommon theme: normalize acrossspatialdimensions and discard the extracted statistics.