Exploring Precision and Recall to assess the quality and diversity of LLMs
Bronnec, Florian Le, Verine, Alexandre, Negrevergne, Benjamin, Chevaleyre, Yann, Allauzen, Alexandre
–arXiv.org Artificial Intelligence
We introduce a novel evaluation framework for Large Language Models (LLMs) such as \textsc{Llama-2} and \textsc{Mistral}, focusing on importing Precision and Recall metrics from image generation to text generation. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora. By conducting a comprehensive evaluation of state-of-the-art language models, the study reveals new insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned on instruction dataset or with human feedback. This work extends the toolkit for distribution-based NLP evaluation, offering insights into the practical capabilities and challenges that current LLMs face in generating diverse and high-quality text. We release our code and data.
arXiv.org Artificial Intelligence
Jun-4-2024
- Country:
- Asia > Russia (0.14)
- Europe
- Poland > Masovia Province
- Warsaw (0.04)
- United Kingdom > Scotland (0.04)
- France > Île-de-France
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Russia (0.14)
- Germany (0.04)
- Austria > Salzburg
- Salzburg (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- Poland > Masovia Province
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California > San Diego County
- San Diego (0.04)
- Florida
- Miami-Dade County > Miami Beach (0.04)
- Palm Beach County > Boynton Beach (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- New York (0.04)
- Pennsylvania (0.04)
- California > San Diego County
- Canada
- Genre:
- Personal > Honors (1.00)
- Research Report > New Finding (0.92)
- Industry:
- Government > Military (0.67)
- Health & Medicine (0.68)
- Leisure & Entertainment > Sports
- Soccer (0.67)
- Media (1.00)
- Technology: