JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation