MultiDEC: Multi-Modal Clustering of Image-Caption Pairs
Yang, Sean, Huang, Kuan-Hao, Howe, BIll
In this paper, we propose a method for clustering image-caption pairs by simultaneously learning image representations and text representations that are constrained to exhibit similar distributions. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce but free-text descriptions are common. MultiDEC initializes parameters with stacked autoencoders, then iteratively minimizes the Kullback-Leibler divergence between the distribution of the images (and text) to that of a combined joint target distribution. We regularize by penalizing non-uniform distributions across clusters. The representations that minimize this objective produce clusters that outperform both single-view and multi-view techniques on large benchmark image-caption datasets.
Jan-4-2019
- Country:
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Genre:
- Research Report (1.00)
- Technology: