Multimodal Conditional Learning with Fast Thinking Policy-like Model and Slow Thinking Planner-like Model