An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science