Large Language Models as Evaluators for Scientific Synthesis