Large Language Models as Counterfactual Generator: Strengths and Weaknesses