Can xLLMs Understand the Structure of Dialog? Exploring Multilingual Response Generation in Complex Scenarios