On Explaining Visual Captioning with Hybrid Markov Logic Networks