Making Multimodal Generation Easier: When Diffusion Models Meet LLMs