Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Open in new window