AITopics | human evaluation score

Collaborating Authors

human evaluation score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Assessing Color Vision Test in Large Vision-language Models

Ye, Hongfei, Chen, Bin, Liu, Wenxi, Zhang, Yu, Li, Zhao, Ni, Dandan, Chen, Hongyang

arXiv.org Artificial IntelligenceJul-16-2025

With the widespread adoption of large vision-language models, the capacity for color vision in these models is crucial. However, the color vision abilities of large visual-language models have not yet been thoroughly explored. To address this gap, we define a color vision testing task for large vision-language models and construct a dataset \footnote{Anonymous Github Showing some of the data https://anonymous.4open.science/r/color-vision-test-dataset-3BCD} that covers multiple categories of test questions and tasks of varying difficulty levels. Furthermore, we analyze the types of errors made by large vision-language models and propose fine-tuning strategies to enhance their performance in color vision tests.

category, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2507.11153

Country: Asia > China (0.30)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.98)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Legal Evalutions and Challenges of Large Language Models

Wang, Jiaqi, Zhao, Huan, Yang, Zhenyuan, Shu, Peng, Chen, Junhao, Sun, Haobo, Liang, Ruixi, Li, Shixin, Shi, Pengcheng, Ma, Longjun, Liu, Zongjia, Liu, Zhengliang, Zhong, Tianyang, Zhang, Yutong, Ma, Chong, Zhang, Xin, Zhang, Tuo, Ding, Tianli, Ren, Yudan, Liu, Tianming, Jiang, Xi, Zhang, Shu

arXiv.org Artificial IntelligenceNov-15-2024

In this paper, we review legal testing methods based on Large Language Models (LLMs), using the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions. We compare current state-of-the-art LLMs, including open-source, closed-source, and legal-specific models trained specifically for the legal domain. Systematic tests are conducted on English and Chinese legal cases, and the results are analyzed in depth. Through systematic testing of legal cases from common law systems and China, this paper explores the strengths and weaknesses of LLMs in understanding and applying legal texts, reasoning through legal issues, and predicting judgments. The experimental results highlight both the potential and limitations of LLMs in legal applications, particularly in terms of challenges related to the interpretation of legal language and the accuracy of legal reasoning. Finally, the paper provides a comprehensive analysis of the advantages and disadvantages of various types of models, offering valuable insights and references for the future application of AI in the legal field.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.10137

Country:

Asia > China (0.34)
North America > United States (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Quality-Diversity through AI Feedback

Bradley, Herbie, Dai, Andrew, Teufel, Hannah, Zhang, Jenny, Oostermeijer, Koen, Bellagente, Marco, Clune, Jeff, Stanley, Kenneth, Schott, Grégory, Lehman, Joel

arXiv.org Artificial IntelligenceDec-7-2023

In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.

diversity characteristic, human evaluation score, neural information processing system, (16 more...)

arXiv.org Artificial Intelligence

2310.13032

Country: