kGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution

Mar-21-2026, 13:50:34 GMT–Neural Information Processing Systems

Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. Unlike application-level software, a systems codebase like Linux is multilingual (low-level C/Assembly/Bash/Rust); gigantic (>20 million lines); critical (impacting billions of devices worldwide), and highly concurrent (involving complex multi-threading). To evaluate if machine learning (ML) models are useful while developing such large-scale systems-level software, we introduce kGym (a platform) and kBench (a dataset). The kGym platform provides a SE environment for large-scale experiments on the Linux kernel, including compiling and running kernels in parallel across several virtual machines, detecting operations and crashes, inspecting logs, and querying and patching the code base.

large language model, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Mar-21-2026, 13:50:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Software (1.00)
  - Artificial Intelligence
    - Machine Learning (0.77)
    - Natural Language > Large Language Model (0.65)