Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations