concatenation
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Invertible DenseNets with Concatenated LipSwish
We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient extension of Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce invertibility of the network by satisfying the Lipschitz constant. Furthermore, we propose a learnable weighted concatenation, which not only improves the model performance but also indicates the importance of the concatenated weighted representation. Additionally, we introduce the Concatenated LipSwish as activation function, for which we show how to enforce the Lipschitz condition and which boosts performance. The new architecture, i-DenseNet, out-performs Residual Flow and other flow-based models on density estimation evaluated in bits per dimension, where we utilize an equal parameter budget. Moreover, we show that the proposed model out-performs Residual Flows when trained as a hybrid model where the model is both a generative and a discriminative model.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada (0.04)
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation (Supplementary Materials)
Figure 1 shows a diagram of the training scheme for the cross-modal retrieval module. Each multiple choice consists of the correct vision+audio fusion embedding along with a pose embedding. Experimental results if one of the modality is erased. Type of Masking SDR () SIR () SAR () Masking is used for visual modality 7.82 14.39 10.65 Masking is used for pose modality 12.06 18.34 14.17 15% random masking for both visual and pose modality 12.34 18.76 14.37 In this paper, we are using sound separation as our primary task. Therefore, we do not consider masking for the audio modality.
- North America > Canada > Ontario > Toronto (0.15)
- North America > Canada > British Columbia (0.05)