Hadamard Product for Low-rank Bilinear Pooling

Kim, Jin-Hwa, On, Kyoung-Woon, Lim, Woosang, Kim, Jeonghee, Ha, Jung-Woo, Zhang, Byoung-Tak

arXiv.org Artificial Intelligence 

Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property. Bilinear models (Tenenbaum & Freeman, 2000) provide richer representations than linear models. To exploit this advantage, fully-connected layers in neural networks can be replaced with bilinear pooling. The outer product of two vectors (or Kroneker product for matrices) is involved in bilinear pooling, as a result of this, all pairwise interactions among given features are considered. Recently, a successful application of this technique is used for fine-grained visual recognition (Lin et al., 2015).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found