AITopics | coderosetta

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Neural Information Processing SystemsMar-22-2026, 04:40:26 GMT

Automatic translation of programming languages has garnered renewed interest, driven by recent advancements in large language models (LLMs). Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extension remains underexplored due to inherent challenges like complex parallel semantics understanding. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model explicitly designed for translating between programming languages and also their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation.It employs a customized learning-based framework with tailored pretraining and training objectives that enable it to effectively capture code semantics and parallel structural nuances, allowing for bidirectional code translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLUE points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our proposed bidirectional learning-based method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLUE with 2.75% higher compilation accuracy.Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for such a complex translation task, improving CodeBLEU at least by 4.63 points compared to closed-source LLMs and Open Code LLM.

large language model, natural language, translation, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

b6edb87876bec4ac2260bffa083cb992-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 16:34:07 GMT

large language model, machine learning, translation, (22 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(3 more...)

Add feedback

b6edb87876bec4ac2260bffa083cb992-Paper-Conference.pdf

Neural Information Processing SystemsOct-11-2025, 00:37:28 GMT

blockidx, coderosetta, translation, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(3 more...)

Add feedback

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

Neural Information Processing SystemsMay-27-2025, 13:49:13 GMT

Automatic translation of programming languages has garnered renewed interest, driven by recent advancements in large language models (LLMs). Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extension remains underexplored due to inherent challenges like complex parallel semantics understanding. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model explicitly designed for translating between programming languages and also their HPC extensions. CodeRosetta is evaluated on C to CUDA and Fortran to C translation.It employs a customized learning-based framework with tailored pretraining and training objectives that enable it to effectively capture code semantics and parallel structural nuances, allowing for bidirectional code translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C to CUDA translation by 2.9 BLEU and 1.72 CodeBLUE points while improving compilation accuracy by 6.05%.

large language model, natural language, translation, (10 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Iran > Tehran Province > Tehran (0.08)

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

TehraniJamsaz, Ali, Bhattacharjee, Arijit, Chen, Le, Ahmed, Nesreen K., Yazdanbakhsh, Amir, Jannesari, Ali

arXiv.org Artificial IntelligenceOct-27-2024

Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored due to challenges such as complex parallel semantics. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLEU points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLEU, with 2.75% higher compilation accuracy. Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for this complex task, improving CodeBLEU by at least 4.63 points compared to closed-source and open-code LLMs.

large language model, machine learning, programming language, (21 more...)

arXiv.org Artificial Intelligence

2410.20527

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

Filters

Collaborating Authors

coderosetta

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

b6edb87876bec4ac2260bffa083cb992-Paper-Conference.pdf

b6edb87876bec4ac2260bffa083cb992-Paper-Conference.pdf

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming

CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming