RoCar: A Relationship Network-based Evaluation Method to Large Language Models

Wang, Ming, Wu, Wenfang, Gao, Chongyun, Wang, Daling, Feng, Shi, Zhang, Yifei

arXiv.org Artificial Intelligence 

Pre-trained Models have become the dominant approach in the field of deep learning since Transformer [1]. Buy now, the Large Language Models (LLMs) represented by ChatGPT [2] have received the widest attention from researchers in the field of Artificial Intelligence (AI), especially Natural Language Processing (NLP). Like LLaMA [3], many open-source LLMs [4, 5, 6, 3, 7, 8] have been published. Due to the strong reasoning, generative and memory abilities acquired by LLMs during training, they are able to operate a variety of traditional tasks based on specific prompts and achieve great performance. As a result, LLMs have gained widespread interest and applications, such as in the financial [9], emotional [10, 11], legal [12], medical [13, 14, 15] and educational [16] fields. To evaluate the capability of LLMs and to guide the selection of more appropriate LLMs in applications, many evaluation approaches [17] for LLMs have been proposed by researchers. C-Eval [18] constructed a reasoning test set of 13,948 questions in 52 subjects ranging from junior school to postgraduate university and vocational exams to evaluate LLM's problem-solving skills. Gaokao-Bench [19] collected questions from the 2010-2022 Chinese national college entrance examination papers, including 1,781 objective questions and 1,030 subjective questions, and constructed a framework for assessing the language comprehension and logical reasoning ability of LLMs. Microsoft has released a new benchmark test, AGIEval [20], by selecting 20 official, public, high-standard exams, including general university entrance exams (Chinese national college entrance examination and the U.S. SAT), law school entrance exams, maths competitions, bar exams, national civil service exams, and more.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found