LoRACode: LoRA Adapters for Code Embeddings

Chaturvedi, Saumya, Chadha, Aman, Bindschaedler, Laurent

Mar-7-2025–arXiv.org Artificial Intelligence

Code embeddings are essential for semantic code search; however, current approaches often struggle to capture the precise syntactic and contextual nuances inherent in code. Open-source models such as CodeBERT and UniXcoder exhibit limitations in scalability and efficiency, while high-performing proprietary systems impose substantial computational costs. We introduce a parameter-efficient fine-tuning method based on Low-Rank Adaptation (LoRA) to construct task-specific adapters for code retrieval. Our approach reduces the number of trainable parameters to less than two percent of the base model, enabling rapid fine-tuning on extensive code corpora (2 million samples in 25 minutes on two H100 GPUs). Experiments demonstrate an increase of up to 9.1% in Mean Reciprocal Rank (MRR) for Code2Code search, and up to 86.69% for Text2Code search tasks across multiple programming languages. Distinction in task-wise and language-wise adaptation helps explore the sensitivity of code retrieval for syntactical and linguistic variations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-7-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County > Santa Clara (0.04)
- Europe
  - Germany (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology
  - Software (1.00)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Natural Language
      - Large Language Model (0.47)
      - Text Processing (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found