CriticBench: Evaluating Large Language Models as Critic