Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Open in new window