Reliable, Reproducible, and Really Fast Leaderboards with Evalica

Dec-15-2024–arXiv.org Artificial Intelligence

The rapid advancement of natural language processing (NLP) technologies, such as instruction-tuned large language models (LLMs), urges the development of modern evaluation protocols with human and machine feedback. We introduce Evalica, an open-source toolkit that facilitates the creation of reliable Figure 1: Evalica facilitates the highlighted aspects of and reproducible model leaderboards. This leaderboard-making that involve aggregation of judgements, paper presents its design, evaluates its performance, scoring the models with bootstrapped confidence and demonstrates its usability through intervals (CIs), and getting the final model ranks.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
- Europe > Serbia
  - Central Serbia > Belgrade (0.04)

Genre:
- Research Report > New Finding (0.47)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found