Benchmark Data Contamination of Large Language Models: A Survey

Xu, Cheng, Guan, Shuhao, Greene, Derek, Kechadi, M-Tahar

Jun-6-2024–arXiv.org Artificial Intelligence

The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable performance during the evaluation phase of the process. This paper reviews the complex challenge of BDC in LLM evaluation and explores alternative assessment methods to mitigate the risks associated with traditional benchmarks. The paper also examines challenges and future directions in mitigating BDC risks, highlighting the complexity of the issue and the need for innovative solutions to ensure the reliability of LLM evaluation in real-world applications.

computational linguistic, language model, proceedings, (13 more...)

arXiv.org Artificial Intelligence

Jun-6-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - California
      - Orange County > Anaheim (0.04)
      - Los Angeles County > Long Beach (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
    - Alberta > Census Division No. 11
      - Edmonton Metropolitan Region > Edmonton (0.04)
- Europe
  - Czechia > Prague (0.04)
  - Spain > Galicia
    - Madrid (0.04)
  - Middle East > Malta
    - Eastern Region > Northern Harbour District > St. Julian's (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Finland > Pirkanmaa
    - Tampere (0.04)
  - Ukraine > Kyiv Oblast
    - Kyiv (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Vietnam (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Overview (1.00)
- Research Report
  - New Finding (1.00)
  - Promising Solution (0.88)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Education (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found