Exploring External Knowledge for Accurate modeling of Visual and Language Problems