Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes

Open in new window