Statistical Machine Translation Is a Natural Fit for Automatic Identifier Renaming in Software Source Code

Lacomis, Jeremy (Carnegie Mellon University) | Jaffe, Alan (Carnegie Mellon University) | Schwartz, Edward J. (Carnegie Mellon University) | Goues, Claire Le (Carnegie Mellon University) | Vasilescu, Bogdan (Carnegie Mellon University)

AAAI Conferences 

Advances in natural language processing have led to a variety of successful tools and techniques for solving problems such as understanding, generating, and translating natural languages. Given the success of these techniques, a natural question is whether they can also be applied to programming languages. However, the initial research has been mixed. Researchers attempting to translate between programming languages by employing statistical machine translation (SMT) found that a large percentage of the translated programs were not syntactically valid. On the other hand, SMT has been successfully employed to recover identifiers in obfuscated JavaScript code. In this paper, we discuss several differences between natural languages and programming languages that can thwart successful application of NLP techniques to program transformation. We also discuss several strategies to cope with these differences in practice, using our own experiences with using SMT to assign meaningful identifier names to variables in decompiled C programs as an example.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found