AITopics | checkeval

Collaborating Authors

checkeval

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ask, Retrieve, Summarize: A Modular Pipeline for Scientific Literature Summarization

Achkar, Pierre, Gollub, Tim, Potthast, Martin

arXiv.org Artificial IntelligenceMay-23-2025

The exponential growth of scientific publications has made it increasingly difficult for researchers to stay updated and synthesize knowledge effectively. This paper presents XSum, a modular pipeline for multi-document summarization (MDS) in the scientific domain using Retrieval-Augmented Generation (RAG). The pipeline includes two core components: a question-generation module and an editor module. The question-generation module dynamically generates questions adapted to the input papers, ensuring the retrieval of relevant and accurate information. The editor module synthesizes the retrieved content into coherent and well-structured summaries that adhere to academic standards for proper citation. Evaluated on the SurveySum dataset, XSum demonstrates strong performance, achieving considerable improvements in metrics such as CheckEval, G-Eval and Ref-F1 compared to existing approaches. This work provides a transparent, adaptable framework for scientific summarization with potential applications in a wide range of domains. Code available at https://github.com/webis-de/scolia25-xsum

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2505.16349

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Education (0.48)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CheckEval: Robust Evaluation Framework using Large Language Model via Checklist

Lee, Yukyung, Kim, Joonghoon, Kim, Jaehee, Cho, Hyowon, Kang, Pilsung

arXiv.org Artificial IntelligenceMar-27-2024

We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more interpretable but also significantly enhances the robustness and reliability of results by focusing on specific evaluation dimensions. Validated through a focused case study using the SummEval benchmark, CheckEval indicates a strong correlation with human judgments. Furthermore, it demonstrates a highly consistent Inter-Annotator Agreement. These findings highlight the effectiveness of CheckEval for objective, flexible, and precise evaluations. By offering a customizable and interactive framework, CheckEval sets a new standard for the use of LLMs in evaluation, responding to the evolving needs of the field and establishing a clear method for future LLM-based evaluation.

arxiv preprint arxiv, checkeval, evaluation, (12 more...)

arXiv.org Artificial Intelligence

2403.18771

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
Asia > South Korea > Seoul > Seoul (0.05)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback