StackEval: Benchmarking LLMs in Coding Assistance

Feb-11-2026, 21:08:29 GMT–Neural Information Processing Systems

LLMs' proficiency as judges for coding tasks using a curated, human-annotated dataset, exploring their evaluation capabilities and potential biases, including whether they favor their own generated solutions. Our findings underscore the potential of these benchmarks to advance LLM development and application in coding assistance.

benchmark, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Feb-11-2026, 21:08:29 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.04)
- Asia > Myanmar
  - Tanintharyi Region > Dawei (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)

Duplicate Docs Excel Report

Title
4126a607bbe2836cb6ca0eb45b75618b-Paper-Datasets_and_Benchmarks_Track.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found