Goto

Collaborating Authors

 no-damage and damage case


The latest developments in Zero Shot Learning 2022 part1(Advanced Machine Learning)

#artificialintelligence

Abstract: This work explores an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. We present VideoCoCa that reuses a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, we surprisingly find that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to flattened frame embeddings'', yielding a strong zero-shot transfer baseline for many video-text tasks. Specifically, the frozen image encoder of a pretrained image-text CoCa takes each video frame as inputs and generates N token embeddings per frame for totally T video frames. We flatten N T token embeddings as a long sequence of frozen video representation and apply CoCa's generative attentional pooling and contrastive attentional pooling on top. All model weights including pooling layers are directly loaded from an image-text CoCa pretrained model.


Zero-Shot Transfer Learning for Structural Health Monitoring using Generative Adversarial Networks and Spectral Mapping

Soleimani-Babakamali, Mohammad Hesam, Soleimani-Babakamali, Roksana, Nasrollahzadeh, Kourosh, Avci, Onur, Kiranyaz, Serkan, Taciroglu, Ertugrul

arXiv.org Artificial Intelligence

Gathering properly labelled, adequately rich, and case-specific data for successfully training a data-driven or hybrid model for structural health monitoring (SHM) applications is a challenging task. We posit that a Transfer Learning (TL) method that utilizes available data in any relevant source domain and directly applies to the target domain through domain adaptation can provide substantial remedies to address this issue. Accordingly, we present a novel TL method that differentiates between the source's no-damage and damage cases and utilizes a domain adaptation (DA) technique. The DA module transfers the accumulated knowledge in contrasting no-damage and damage cases in the source domain to the target domain, given only the target's no-damage case. High-dimensional features allow employing signal processing domain knowledge to devise a generalizable DA approach. The Generative Adversarial Network (GAN) architecture is adopted for learning since its optimization process accommodates high-dimensional inputs in a zero-shot setting. At the same time, its training objective conforms seamlessly with the case of no-damage and damage data in SHM since its discriminator network differentiates between real (no damage) and fake (possibly unseen damage) data. An extensive set of experimental results demonstrates the method's success in transferring knowledge on differences between no-damage and damage cases across three strongly heterogeneous independent target structures. The area under the Receiver Operating Characteristics curves (Area Under the Curve - AUC) is used to evaluate the differentiation between no-damage and damage cases in the target domain, reaching values as high as 0.95. With no-damage and damage cases discerned from each other, zero-shot structural damage detection is carried out. The mean F1 scores for all damages in the three independent datasets are 0.978, 0.992, and 0.975.