Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, Qibin Zhao
–Neural Information Processing Systems
More importantly, simply fusing features all at once ignores the complex local intercorrelations, leading to the deterioration of prediction. In this work, we first propose a polynomial tensor pooling (PTP) block for integrating multimodal features by considering high-order moments, followed by a tensorized fully connected layer. Treating PTP as a building block, we further establish a hierarchical polynomial fusion network (HPFN) to recursively transmit local correlations into global ones.
Neural Information Processing Systems
Feb-15-2026, 03:24:39 GMT