JudgeLM: Fine-tuned Large Language Models are Scalable Judges