Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs

Yaldiz, Duygu Nur, Bakman, Yavuz Faruk, Buyukates, Baturalp, Tao, Chenyang, Ramakrishna, Anil, Dimitriadis, Dimitrios, Avestimehr, Salman

Jun-17-2024–arXiv.org Artificial Intelligence

In this work, we introduce the Learnable Response Scoring Function (LARS) for Uncertainty Estimation (UE) in generative Large Language Models (LLMs). Current scoring functions for probability-based UE, such as length-normalized scoring and semantic contribution-based weighting, are designed to solve specific aspects of the problem but exhibit limitations, including the inability to handle biased probabilities and under-performance in low-resource languages like Turkish. To address these issues, we propose LARS, a scoring function that leverages supervised data to capture complex dependencies between tokens and probabilities, thereby producing more reliable and calibrated response scores in computing the uncertainty of generations. Our extensive experiments across multiple datasets show that LARS substantially outperforms existing scoring functions considering various probability-based UE methods.

dataset, lar, probability, (16 more...)

arXiv.org Artificial Intelligence

Jun-17-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.04)
- South America > Brazil
  - Rio de Janeiro > Rio de Janeiro (0.04)
  - Federal District > Brasília (0.04)
- North America
  - Canada (0.04)
  - United States > Washington
    - King County > Seattle (0.04)
- Asia
  - Singapore (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Performance Analysis
    - Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found