C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

Sirlanci, Melih, Yagemann, Carter, Lin, Zhiqiang

Apr-22-2025–arXiv.org Artificial Intelligence

Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Apr-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report (0.84)

Industry:
- Information Technology (0.68)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology
  - Software (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (0.68)
    - Machine Learning > Statistical Learning (0.68)
    - Natural Language > Large Language Model (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found