Evaluating the Performance of Large Language Models on GAOKAO Benchmark