UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance