An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks

Open in new window