Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment