Context-Augmented Code Generation Using Programming Knowledge Graphs

Oct-9-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) and Code-LLMs (CLLMs) have significantly improved code generation, but, they frequently face difficulties when dealing with challenging and complex problems. However, retrieval models often fail to find most relevant context, and generation models, with limited context capacity, can hallucinate when given irrelevant data. We present a novel framework that leverages a Programming Knowledge Graph (PKG) to semantically represent and retrieve code. This approach enables fine-grained code retrieval by focusing on the most relevant segments while reducing irrelevant context through a tree-pruning technique. PKG is coupled with a re-ranking mechanism to reduce even more hallucinations by selectively integrating non-RAG solutions. We propose two retrieval approaches--block-wise and function-wise--based on the PKG, optimizing context granularity. Evaluations on the HumanEval and MBPP benchmarks show our method improves pass@1 accuracy by up to 20%, and outperforms state-of-the-art models by up to 34% on MBPP. Our contributions include PKG-based retrieval, tree pruning to enhance retrieval precision, a re-ranking method for robust solution selection and a Fill-in-the-Middle (FIM) enhancer module for automatic code augmentation with relevant comments and docstrings. Large Language Models (LLMs) have significantly improved the performance of tasks related to code, such as code generation (Huang et al., 2023; Roziere et al., 2023a; Li et al., 2023; Wang et al., 2023). As code-related models continue to emerge rapidly (Chen et al., 2021; Li et al., 2023; 2022; Roziere et al., 2023a; Zhu et al., 2024), most of these models rely on a natural language-to-code (NL-to-Code) paradigm, which often lacks the ability to leverage existing contextual information (Wang et al., 2024). Generating a solution from scratch, without access to supplementary context, poses significant challenges (Wang et al., 2024), even for humans (Zhong et al., 2024).

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Oct-9-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada > British Columbia > Regional District of Central Okanagan > Kelowna (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found