code generation model
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
Code generation models have increasingly become integral to aiding software development. Although current research has thoroughly examined the correctness of the code produced by code generation models, a vital aspect that plays a pivotal role in greencomputing and sustainability efforts -- the efficiency of the generated code -- has often been neglected. This paper presents Effibench, a benchmark with 1,000 efficiency-critical coding problems to assess the efficiency of code generated by code generation models. EffiBench contains a diverse set of LeetCode coding problems. Each problem is paired with an executable human-written canonical solution, which obtains the SOTA efficiency on the LeetCode solution leaderboard. With EffiBench, we empirically examine the ability of 42 large language models (35 open-source and 7 closed-source) to generate efficient code. Our evaluation results demonstrate that the efficiency of the code generated by LLMs is generally worse than the efficiency of human-written canonical solutions. For example, GPT-4 generated code has an average \textbf{3.12}
Uncovering and Quantifying Social Biases in Code Generation
With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
- Europe > Austria > Vienna (0.14)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > Canada (0.04)
- (9 more...)
Appendix Uncovering and Quantifying Social Biases in Code Generation
We conduct a preliminary study on finding a proper prompt construction strategy. Further research can utilize our analysis to construct more powerful code prompts. Table 1: Code prompt study results of CBS. N" means there are one human-relevant function Table 2: Automatic and human evaluation results of social biases in the generated code on GPT -4. We also conduct experiments on GPT -4.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.45)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Asia > China (0.28)
- Africa > Rwanda (0.14)
- North America > Canada (0.14)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.74)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Appendix Uncovering and Quantifying Social Biases in Code Generation
We conduct a preliminary study on finding a proper prompt construction strategy. Further research can utilize our analysis to construct more powerful code prompts. Table 1: Code prompt study results of CBS. N" means there are one human-relevant function Table 2: Automatic and human evaluation results of social biases in the generated code on GPT -4. We also conduct experiments on GPT -4.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.45)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)