Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings