Assessing the efficacy of large language models in generating accurate teacher responses