Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training