Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization