Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

Shetty, Manish, Jain, Naman, Godbole, Adwait, Seshia, Sanjit A., Sen, Koushik

Dec-21-2024–arXiv.org Artificial Intelligence

It is further motivated by the fact that both C and Rust can target similar applications (low-level, performance-critical libraries) and are supported by Clang-based compiler toolchains. Though desirable, C-Rust translation is challenging: C and (safe) Rust employ different typing systems (strongly typed variables and no raw pointers in Rust) and different variable access rules (arbitrary accesses in C, while strict borrowing rules in Rust), amongst other differences. Manual migration of even moderately sized codebases requires multiple person-weeks of effort, motivating the need for automatic translation techniques. There are two main approaches for code translation: rule-based/symbolic and LLM (Large Language Model)-based. Rule-based translation approaches often operate on a terse intermediate representation (for achieving full coverage with a limited rule set) and thus often produce uninterpretable target code. Symbolic program synthesis approaches (e.g., [2, 34]), on the other hand, often do not scale to multi-function codebases. LLMs shine in both these respects: they produce natural/interpretable code and have superior scaling capabilities.

large language model, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

Dec-21-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - California
    - Alameda County > Berkeley (0.14)
    - Santa Clara County > Santa Clara (0.04)
- Europe > Russia
  - Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
- Asia
  - Russia (0.04)
  - Middle East > Jordan (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology (0.92)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found