Causal-aware Safe Policy Improvement for Task-oriented dialogue