ucf101
Sequential Memory with Temporal Predictive Coding Supplementary Materials
In Algorithm 1 we present the memorizing and recalling procedures of the single-layer tPC.Algorithm 1 Memorizing and recalling with single-layer tPC Here we present the proof for Property 1 in the main text, that the single-layer tPC can be viewed as a "whitened" version of the AHN. When applied to the data sequence, it whitens the data such that (i.e., Eq.16 in the main text): These observations are consistent with our numerical results shown in Figure 1. MCAHN has a much larger MSE than that of the tPC because of the entirely wrong recalls. In Figure 1 we also present the online recall results of the models in MovingMNIST, CIFAR10 and UCF101. In Fig 4 we show a natural example of aliased sequences where a movie of a human doing push-ups is memorized and recalled by the model.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Media (0.69)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Sequential Memory with Temporal Predictive Coding Supplementary Materials
In Algorithm 1 we present the memorizing and recalling procedures of the single-layer tPC.Algorithm 1 Memorizing and recalling with single-layer tPC Here we present the proof for Property 1 in the main text, that the single-layer tPC can be viewed as a "whitened" version of the AHN. When applied to the data sequence, it whitens the data such that (i.e., Eq.16 in the main text): These observations are consistent with our numerical results shown in Figure 1. MCAHN has a much larger MSE than that of the tPC because of the entirely wrong recalls. In Figure 1 we also present the online recall results of the models in MovingMNIST, CIFAR10 and UCF101. In Fig 4 we show a natural example of aliased sequences where a movie of a human doing push-ups is memorized and recalled by the model.
discussion and implementation details. 23 [ All Reviewers ] Related work. We agree with the reviewers that a more extended discussion is required for related
This model is trained on UCF101 with the same schedule as CoCLR for a fair comparison. We note a recent arXiv paper (MemDPC, to appear in ECCV2020 by Han et al.) has also used both RGB and optical We will add these discussions. We actually used the same augmentation as DPC in their released codebase. These are the core contributions of our paper. CoCLR-RGB model gets 70.2% by linear probing.
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering
Kurpukdee, Nattapong, Bors, Adrian G.
We propose a realistic scenario for the unsupervised video learning where neither task boundaries nor labels are provided when learning a succession of tasks. We also provide a non-parametric learning solution for the under-explored problem of unsupervised video continual learning. Videos represent a complex and rich spatio-temporal media information, widely used in many applications, but which have not been sufficiently explored in unsupervised continual learning. Prior studies have only focused on supervised continual learning, relying on the knowledge of labels and task boundaries, while having labeled data is costly and not practical. To address this gap, we study the unsupervised video continual learning (uVCL). uVCL raises more challenges due to the additional computational and memory requirements of processing videos when compared to images. We introduce a general benchmark experimental protocol for uVCL by considering the learning of unstructured video data categories during each task. We propose to use the Kernel Density Estimation (KDE) of deep embedded video features extracted by unsupervised video transformer networks as a non-parametric probabilistic representation of the data. We introduce a novelty detection criterion for the incoming new task data, dynamically enabling the expansion of memory clusters, aiming to capture new knowledge when learning a succession of tasks. We leverage the use of transfer learning from the previous tasks as an initial state for the knowledge transfer to the current learning task. We found that the proposed methodology substantially enhances the performance of the model when successively learning many tasks. We perform in-depth evaluations on three standard video action recognition datasets, including UCF101, HMDB51, and Something-to-Something V2, without using any labels or class boundaries.
Toward Lightweight and Fast Decoders for Diffusion Models in Image and Video Generation
Buzovkin, Alexey, Shilov, Evgeny
We investigate methods to reduce inference time and memory footprint in stable diffusion models by introducing lightweight decoders for both image and video synthesis. Traditional latent diffusion pipelines rely on large Variational Autoencoder decoders that can slow down generation and consume considerable GPU memory. We propose custom-trained decoders using lightweight Vision Transformer and Taming Transformer architectures. Experiments show up to 15% overall speed-ups for image generation on COCO2017 and up to 20 times faster decoding in the sub-module, with additional gains on UCF-101 for video tasks. Memory requirements are moderately reduced, and while there is a small drop in perceptual quality compared to the default decoder, the improvements in speed and scalability are crucial for large-scale inference scenarios such as generating 100K images. Our work is further contextualized by advances in efficient video generation, including dual masking strategies, illustrating a broader effort to improve the scalability and efficiency of generative models.