C3LLM: Conditional Multimodal Content Generation Using Large Language Models