Can Large Language Models Be an Alternative to Human Evaluations?

Open in new window