Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

Open in new window