An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information