StackEval: Benchmarking LLMs in Coding Assistance
–Neural Information Processing Systems
LLMs' proficiency as judges for coding tasks using a curated, human-annotated dataset, exploring their evaluation capabilities and potential biases, including whether they favor their own generated solutions. Our findings underscore the potential of these benchmarks to advance LLM development and application in coding assistance.
Neural Information Processing Systems
Feb-11-2026, 21:08:29 GMT
- Country:
- Asia > Myanmar
- Tanintharyi Region > Dawei (0.04)
- North America > United States (0.04)
- Asia > Myanmar
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Information Technology (0.47)
- Technology: