SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in Fine-tuned Source Code Models
Hajipour, Hossein, Yu, Ning, Staicu, Cristian-Alexandru, Fritz, Mario
–arXiv.org Artificial Intelligence
Large code datasets have become increasingly accessible for pre-training source code models. However, for the fine-tuning phase, obtaining representative training data that fully covers the code distribution for specific downstream tasks remains challenging due to the task-specific nature and limited labeling resources. Moreover, fine-tuning pretrained models can result in forgetting previously acquired pre-training knowledge. These lead to out-of-distribution (OOD) generalization issues with unexpected model inference behaviors that have not been systematically studied yet. In this paper, we contribute the first systematic approach that simulates various OOD scenarios along different dimensions of source code data properties and study the fine-tuned model behaviors in such scenarios. We investigate the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods. Our comprehensive analysis, conducted on four state-of-the-art pretrained models and applied to two code generation tasks, exposes multiple failure modes attributed to OOD generalization issues. Additionally, our analysis uncovers that LoRA finetuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios. There has been increasing success in applying Large Language Models (LLMs) to various source code understanding and generation tasks. LLMs for codes such as CodeBERT Feng et al. (2020), GraphCodeBERT Guo et al. (2021), CodeT5+ Wang et al. (2023), CodeGen Nijkamp et al. (2023), and Code Llama Rozière et al. (2023) are pretrained using large-scale source code Figure 1: Our approach simulates out-ofdistribution datasets and serve as universal initialization for a (OOD) scenarios and analyzes the variety of downstream tasks. These tasks include corresponding behaviors of models. The emerging abilities of LLMs, such as in-context learning, demonstrate their potential to handle a wide range of tasks (Wei et al., 2022; Brown et al., 2020). However, it has been shown that not all tasks can be effectively addressed by relying only on the pretrained LLMs Anil et al. (2022). To adapt pretrained models for specific tasks, they can be fine-tuned with specific datasets for each downstream task. This fine-tuning process can involve optimizing all parameters or adopting a parameter-efficient approach (Houlsby et al., 2019; Hu et al., 2022), such as Low-Rank Adaptation (LoRA)Hu et al. (2022).
arXiv.org Artificial Intelligence
Oct-30-2023
- Country:
- Europe (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology (0.67)
- Technology: