Can Smaller Large Language Models Evaluate Research Quality?