A Survey of Current Datasets for Vision and Language Research