Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

Yuan, Zhiqiang, Liu, Junwei, Zi, Qiancheng, Liu, Mingwei, Peng, Xin, Lou, Yiling

Aug-2-2023–arXiv.org Artificial Intelligence

In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks. We have the following main findings. First, for the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks and sometimes even better than small SOTA models specifically fine-tuned on each downstream task. We also find that larger instructed LLMs are not always better on code-related tasks. Second, for the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better on most code comprehension and generation tasks; however, the examples would sometimes induce unstable or even worse performance. Furthermore, we find widely-used BM25-based shot selection strategy significantly outperforms the basic random selection or fixed selection only on generation problems. Third, for the fine-tuning setting, we find that fine-tuning could further improve the model performance on downstream code comprehension and generation tasks compared to the zero-shot/one-shot performance. In addition, after being fine-tuned on the same downstream task dataset, instructed LLMs outperform both the small SOTA models and similar-scaled LLMs without instruction tuning. Based on our findings, we further present practical implications on model and usage recommendation, performance and cost trade-offs, and future direction.

generation task, instruction-tuned llm, llm, (14 more...)

arXiv.org Artificial Intelligence

Aug-2-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States > California
    - Los Angeles County > Long Beach (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia
      - Metro Vancouver Regional District > Vancouver (0.04)
      - Vancouver Island > Capital Regional District
        Victoria (0.04)
- Europe
  - Austria > Vienna (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Greece > Attica
    - Athens (0.04)
- Asia
  - Singapore (0.04)
  - China (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found