LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages

Diehl, Patrick, Nader, Nojoud, Moraru, Maxim, Brandt, Steven R.

arXiv.org Artificial Intelligence 

Large Language Models (LLMs) have made significant advances in various code-related tasks, particularly in generating source code from natural language descriptions (Zhao et al. (2023); Chang et al. (2024)). Their effectiveness is primarily driven by their extensive number of model parameters, the use of large and diverse datasets, and the immense computational resources employed during training (Kaplan et al. (2020)). These models are typically trained on vast corpora sourced from the web. LLMs are capable of capturing intricate patterns, linguistic subtleties, and semantic relationships. A wide range of models are available for code generation. There are general-purpose models like ChatGPT (Ouyang et al. (2022)), GPT -4 (Achiam et al. (2023)), and LLaMA (Touvron et al. (2023a)) which are designed for a broad range of applications, as well as specialized models such as StarCoder, Code LLaMA (Roziere et al. (2023)), DeepSeek-Coder, and Code Gemma that are optimized for code-related tasks. The integration of code generation with the latest advances in LLM technology is now an essential tool for many businesses, as well as an essential target for LLM developers as programming languages are considered to be different dialects of natural language (Athiwaratkun et al. (2022)).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found