Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies