Can Large Language Models Be an Alternative to Human Evaluations?