Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model