Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval