deep multimodal multilinear fusion
Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
Tensor-based multimodal fusion techniques have exhibited great predictive performance. However, one limitation is that existing approaches only consider bilinear or trilinear pooling, which fails to unleash the complete expressive power of multilinear fusion with restricted orders of interactions. More importantly, simply fusing features all at once ignores the complex local intercorrelations, leading to the deterioration of prediction. In this work, we first propose a polynomial tensor pooling (PTP) block for integrating multimodal features by considering high-order moments, followed by a tensorized fully connected layer. Treating PTP as a building block, we further establish a hierarchical polynomial fusion network (HPFN) to recursively transmit local correlations into global ones. By stacking multiple PTPs, the expressivity capacity of HPFN enjoys an exponential growth w.r.t. the number of layers, which is shown by the equivalence to a very deep convolutional arithmetic circuits. Various experiments demonstrate that it can achieve the state-of-the-art performance.
Reviews: Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
The preliminaries section lays out the required mathematical formulations of tensor networks. While the section presented serves its function, it could potentially be more clear if the authors spaced out and typeset the maths (similar to how it is done in Section 3.1). The visualizations in Figure 2/3 illustrates how a fusion network (and hierarchical network) can be constructed with PTP units. These visualizations clearly communicate how features are pooled across modality and time step. That said, perhaps the descriptions about the HPFN (Section 3.2) are overly verbose.
Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
Tensor-based multimodal fusion techniques have exhibited great predictive performance. However, one limitation is that existing approaches only consider bilinear or trilinear pooling, which fails to unleash the complete expressive power of multilinear fusion with restricted orders of interactions. More importantly, simply fusing features all at once ignores the complex local intercorrelations, leading to the deterioration of prediction. In this work, we first propose a polynomial tensor pooling (PTP) block for integrating multimodal features by considering high-order moments, followed by a tensorized fully connected layer. Treating PTP as a building block, we further establish a hierarchical polynomial fusion network (HPFN) to recursively transmit local correlations into global ones.
Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
Hou, Ming, Tang, Jiajia, Zhang, Jianhai, Kong, Wanzeng, Zhao, Qibin
Tensor-based multimodal fusion techniques have exhibited great predictive performance. However, one limitation is that existing approaches only consider bilinear or trilinear pooling, which fails to unleash the complete expressive power of multilinear fusion with restricted orders of interactions. More importantly, simply fusing features all at once ignores the complex local intercorrelations, leading to the deterioration of prediction. In this work, we first propose a polynomial tensor pooling (PTP) block for integrating multimodal features by considering high-order moments, followed by a tensorized fully connected layer. Treating PTP as a building block, we further establish a hierarchical polynomial fusion network (HPFN) to recursively transmit local correlations into global ones.