Evaluating Large Language Models with fmeval

Schwöbel, Pola, Franceschi, Luca, Zafar, Muhammad Bilal, Vasist, Keerthan, Malhotra, Aman, Shenhar, Tomer, Tailor, Pinal, Yilmaz, Pinar, Diamond, Michael, Donini, Michele

Jul-15-2024–arXiv.org Artificial Intelligence

fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-15-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- Europe > Germany
  - Berlin (0.14)
- North America > United States (0.68)

Genre:
- Research Report (0.40)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Performance Analysis > Accuracy (0.48)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found