Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

Sertkan, Mete, Althammer, Sophia, Hofstätter, Sebastian

May-24-2023–arXiv.org Artificial Intelligence

In this paper, we introduce Ranger - a toolkit to facilitate the easy use of effect-size-based meta-analysis for multi-task evaluation in NLP and IR. We observed that our communities often face the challenge of aggregating results over incomparable metrics and scenarios, which makes conclusions and take-away messages less reliable. With Ranger, we aim to address this issue by providing a task-agnostic toolkit that combines the effect of a treatment on multiple tasks into one statistical evaluation, allowing for comparison of metrics and computation of an overall summary effect. Our toolkit produces publication-ready forest plots that enable clear communication of evaluation results over multiple tasks. Our goal with the ready-to-use Ranger toolkit is to promote robust, effect-size-based evaluation and improve evaluation standards in the community. We provide two case studies for common IR and NLP settings to highlight Ranger's benefits.

artificial intelligence, evaluation, natural language, (17 more...)

arXiv.org Artificial Intelligence

May-24-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States
    - New York > New York County
      - New York City (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
- Europe
  - Ireland (0.04)
  - Norway > Western Norway
    - Rogaland > Stavanger (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found