Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

Shetty, Manish, Jain, Naman, Godbole, Adwait, Seshia, Sanjit A., Sen, Koushik

arXiv.org Artificial Intelligence 

It is further motivated by the fact that both C and Rust can target similar applications (low-level, performance-critical libraries) and are supported by Clang-based compiler toolchains. Though desirable, C-Rust translation is challenging: C and (safe) Rust employ different typing systems (strongly typed variables and no raw pointers in Rust) and different variable access rules (arbitrary accesses in C, while strict borrowing rules in Rust), amongst other differences. Manual migration of even moderately sized codebases requires multiple person-weeks of effort, motivating the need for automatic translation techniques. There are two main approaches for code translation: rule-based/symbolic and LLM (Large Language Model)-based. Rule-based translation approaches often operate on a terse intermediate representation (for achieving full coverage with a limited rule set) and thus often produce uninterpretable target code. Symbolic program synthesis approaches (e.g., [2, 34]), on the other hand, often do not scale to multi-function codebases. LLMs shine in both these respects: they produce natural/interpretable code and have superior scaling capabilities.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found