Aligning Spoken Dialogue Models from User Interactions