Bilinear Attention Networks
Kim, Jin-Hwa, Jun, Jaehyun, Zhang, Byoung-Tak
–Neural Information Processing Systems
Attention networks in multimodal learning provide an efficient way to utilize given visual information selectively. However, the computational cost to learn attention distributions for every pair of multimodal input channels is prohibitively expensive. To solve this problem, co-attention builds two separate attention distributions for each modality neglecting the interaction between multimodal inputs. In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly. BAN considers bilinear interactions among two groups of input channels, while low-rank bilinear pooling extracts the joint representations for each pair of channels.
Neural Information Processing Systems
Feb-14-2020, 08:14:20 GMT
- Technology: