Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Oct-9-2025, 01:30:54 GMT–Neural Information Processing Systems

We argue that this discrepancy primarily arises due to existing evaluation that only measures LLMs'

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Oct-9-2025, 01:30:54 GMT

Conferences PDF

Country:
- North America > United States
  - Hawaii (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Asia > Myanmar
  - Tanintharyi Region > Dawei (0.04)

Genre:
- Research Report (0.68)

Industry:
- Education (0.46)
- Banking & Finance > Economy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.79)

Duplicate Docs Excel Report

Title
91f18a1287b398d378ef22505bf41832-Paper-Datasets_and_Benchmarks.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found