A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
Liu, Fang, Li, Ge, Wei, Bolin, Xia, Xin, Li, Ming, Fu, Zhiyi, Jin, Zhi
–arXiv.org Artificial Intelligence
--Code completion, one of the most useful features in the integrated development environments, can accelerate software development by suggesting the libraries, APIs, method names in real-time. Recent studies have shown that statistical language models can improve the performance of code completion tools through learning from large-scale software repositories. However, these models suffer from three major drawbacks: a) The hierarchical structural information of the programs is not fully utilized in the program's representation; b) In programs, the semantic relationships can be very long, existing LSTM based language models are not sufficient to model the long-term dependency. In this paper, we present a novel method that introduces the hierarchical structural information into the representation of programs by considering the path from the predicting node to the root node. T o capture the long-term dependency in the input programs, we apply Transformer-XL network as the base language model. Besides, we creatively propose a Multi-T ask Learning (MTL) framework to learn two related tasks in code completion jointly, where knowledge acquired from one task could be beneficial to another task. Experiments on three real-world datasets demonstrate the effectiveness of our model when compared with state-of-the-art methods. As the complexity and scale of the software developing continue to grow, code completion has become an essential feature of Integrated Development Environments (IDEs). It can speed up the process of software development by suggesting the next probable token based on existing code. However, traditional code completion tools rely on compile-time type information or heuristics rules to make recommendations [1], [2], which are costly and could not well capture human's programming patterns. To alleviate this problem, code completion research started to focus on learning from large-scale codebases in recent years. Based on the observation of source code's repeatability and predictability [3], statistical language models are generally used for modeling source code. N-gram is one of the most widely used language models [3]-[5]. Most recently, as the success of deep learning, source code modeling techniques have turned to Recurrent Neural Network (RNN) based models [2], [6]. In these models, a piece of source code is represented as source code token sequence or Abstract Syntactic Tree (AST) node sequence. Given a partial code sequence, the model computes the probability of the next token or AST node and recommends the one with the highest probability.
arXiv.org Artificial Intelligence
Sep-16-2019
- Country:
- Genre:
- Research Report
- New Finding (1.00)
- Promising Solution (0.68)
- Research Report
- Technology: