An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks