Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference