AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions
Wang, Zihan, Chen, Jiaze, Liu, Zhicheng, Mak, Markus, Du, Yidi, Moon, Geonsik, Xu, Luoqi, Tua, Aaron, Peng, Kunshuo, Lu, Jiayi, Xia, Mingfei, Zou, Boqian, Ran, Chenyang, Tian, Guang, Zhu, Shoutai, Duan, Yeheng, Kang, Zhenghui, Lin, Zhenxing, Li, Shangshu, Luo, Qiang, Long, Qingshen, Chen, Zhiyong, Xiao, Yihan, Wu, Yurong, Zan, Daoguang, Fu, Yuyi, Wang, Mingxuan, Ding, Ming
–arXiv.org Artificial Intelligence
Competitive programming has emerged as a critical benchmark for evaluating the reasoning and coding capabilities of Large Language Models (LLMs). Despite impressive progress on existing benchmarks, we argue that current evaluations overstate model proficiency, masking a substantial gap between LLMs and elite human programmers. This gap arises from two key limitations: insufficient difficulty and scope of benchmark problems, and evaluation bias from low-quality test cases. To address these shortcomings, we present AetherCode, a new benchmark that draws problems from premier programming competitions such as IOI and ICPC, offering broader coverage and higher difficulty. AetherCode further incorporates comprehensive, expert-validated test suites built through a hybrid of automated generation and human curation, ensuring rigorous and reliable assessment. By combining challenging problem design with robust evaluation, AetherCode provides a more faithful measure of LLM capabilities and sets a new standard for future research in code reasoning.
arXiv.org Artificial Intelligence
Aug-25-2025
- Country:
- Africa (0.04)
- Asia
- China
- Hong Kong (0.04)
- Henan Province > Zhengzhou (0.04)
- Liaoning Province > Shenyang (0.04)
- Yunnan Province > Kunming (0.04)
- Jiangsu Province > Nanjing (0.04)
- Chongqing Province > Chongqing (0.04)
- Shanghai > Shanghai (0.04)
- Sichuan Province > Chengdu (0.04)
- Hubei Province > Wuhan (0.04)
- Zhejiang Province > Hangzhou (0.04)
- Heilongjiang Province > Harbin (0.04)
- Japan > Honshū
- Kantō > Kanagawa Prefecture > Yokohama (0.04)
- Kazakhstan > Akmola Region
- Astana (0.04)
- Singapore (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China
- Europe
- Central Europe (0.04)
- Northern Europe (0.04)
- Southeast Europe (0.04)
- North America
- Canada > Rocky Mountains (0.04)
- Central America (0.04)
- United States
- California (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Rocky Mountains (0.04)
- South America (0.04)
- Genre:
- Research Report (0.83)
- Industry:
- Education (0.46)
- Leisure & Entertainment (0.68)
- Technology: