GAOKAO-Eval: Does high scores truly reflect strong capabilities in LLMs?

Open in new window