AITopics | Automatic Programming

Collaborating Authors

Automatic Programming

"Computer programming is the process of constructing executable code from fragmentary information. ... When computer programming is done by a machine, the process is called automatic programming. AI researchers are interested in studying automatic programming for two reasons: First, it would be highly useful to have a powerful automatic programming systems that could receive casual and imprecise specifications for a desired target program and then correctly generate that program; second, automatic programming is widely believed to be a necessary component of any intelligent system and is therefore a topic for fundamental research in its own right."
– excerpt from Biermann, A. 1992. Automatic Programming. In Encyclopedia of Artificial Intelligence. 2nd edition, Stuart C. Shapiro, editor, 18 - 35. New York: John Wiley & Sons.

News Overviews Instructional Materials AI-Alerts Classics

IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages

Singh, Ujjwal, Sharma, Aditi, Gupta, Nikhil, Deepakshi, null, Jha, Vivek Kumar

arXiv.org Artificial IntelligenceFeb-26-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation from natural language prompts, revolutionizing software development workflows. As we advance towards agent-based development paradigms, these models form the cornerstone of next-generation software development lifecycles. However, current benchmarks for evaluating multilingual code generation capabilities are predominantly English-centric, limiting their applicability across the global developer community. To address this limitation, we present IndicEval-XL, a comprehensive benchmark for code generation that incorporates 6 major Indic languages, collectively spoken by approximately 14\% of the world's population. Our benchmark bridges these languages with 12 programming languages, creating a robust evaluation framework. This work is particularly significant given India's representation of one-eighth of the global population and the crucial role Indic languages play in Indian society. IndicEval-XL represents a significant step toward expanding the linguistic diversity in code generation systems and evaluation frameworks. By developing resources that support multiple languages, we aim to make AI-powered development tools more inclusive and accessible to developers of various linguistic backgrounds. To facilitate further research and development in this direction, we make our dataset and evaluation benchmark publicly available at https://github.com/telekom/IndicEval-XL

gemini 2, natural language gemini 1, sanskrit 0, (14 more...)

arXiv.org Artificial Intelligence

2502.19067

Country:

North America > United States (0.05)
Asia > Pakistan (0.04)
Asia > Nepal (0.04)
(14 more...)

Genre:

Workflow (0.54)
Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ScriptoriumWS: A Code Generation Assistant for Weak Supervision

Huang, Tzu-Heng, Cao, Catherine, Schoenberg, Spencer, Vishwakarma, Harit, Roberts, Nicholas, Sala, Frederic

arXiv.org Artificial IntelligenceFeb-17-2025

Weak supervision is a popular framework for overcoming the labeled data bottleneck: the need to obtain labels for training data. In weak supervision, multiple noisy-but-cheap sources are used to provide guesses of the label and are aggregated to produce high-quality pseudolabels. These sources are often expressed as small programs written by domain experts -- and so are expensive to obtain. Instead, we argue for using code-generation models to act as coding assistants for crafting weak supervision sources. We study prompting strategies to maximize the quality of the generated sources, settling on a multi-tier strategy that incorporates multiple types of information. We explore how to best combine hand-written and generated sources. Using these insights, we introduce ScriptoriumWS, a weak supervision system that, when compared to hand-crafted sources, maintains accuracy and greatly improves coverage.

automatic programming, code generation assistant, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2502.12366

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation

Shao, Liangying, Yan, Yanfu, Poshyvanyk, Denys, Su, Jinsong

arXiv.org Artificial IntelligenceFeb-17-2025

Deep learning-based code generation has completely transformed the way developers write programs today. Existing approaches to code generation have focused either on the Sequence-to-Sequence paradigm, which generates target code as a sequence of tokens, or the Sequence-to-Tree paradigm, which outputs code as a sequence of actions. While these two paradigms are intuitively complementary, their combination has not been previously explored. By comparing the code generated under these two paradigms, we find that integrating them holds significant potential. In this paper, we propose UniGenCoder for code-related generation tasks, which consists of a shared encoder, a shared decoder with a minimal set of additional parameters to unify two paradigms, and a selector that dynamically chooses optimal paradigm for each instance. Also, during the model training, we first perform the multi-task learning and distillation strategies to facilitate knowledge transfer between two paradigms, and then leverage contrastive learning to train the selector. Experimental results on the text-to-code and code-to-code generation tasks demonstrate the effectiveness of our proposed model. We release our code at https://github.com/DeepLearnXMU/UniGenCoder.

artificial intelligence, codet5, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.1249

Country: Asia > China > Fujian Province (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Islam, Md. Ashraful, Ali, Mohammed Eunus, Parvez, Md Rizwan

arXiv.org Artificial IntelligenceFeb-8-2025

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal debugging through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.05664

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Singapore (0.04)
(8 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Proving the Coding Interview: A Benchmark for Formally Verified Code Generation

Dougherty, Quinn, Mehta, Ronak

arXiv.org Artificial IntelligenceFeb-8-2025

We introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples for writing programs and proving their correctness, the largest formal verification benchmark, including 1083 curated and quality controlled samples. Previously, APPS provided a benchmark and dataset for programming puzzles to be completed in Python and checked against unit tests, of the kind seen in technical assessments in the software engineering industry. Building upon recent approaches for benchmarks in interactive theorem proving, we generalize the unit tests to Lean 4 theorems given without proof (i.e., using Lean's "sorry" keyword). On the 406 theorems of 100 randomly selected samples, Sonnet correctly proves 30% and Gemini correctly proves 18%. We challenge the machine learning and program synthesis communities to solve both each general purpose programming problem and its associated correctness specifications. The benchmark is available at https://huggingface.co/datasets/quinn-dougherty/fvapps.

theorem, theorem statement, voter, (15 more...)

arXiv.org Artificial Intelligence

2502.05714

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

CodeSCM: Causal Analysis for Multi-Modal Code Generation

Gupta, Mukur, Bhatt, Noopur, Jana, Suman

arXiv.org Artificial IntelligenceFeb-7-2025

In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt modalities, such as natural language, code, and input-output examples, on the model. CodeSCM introduces latent mediator variables to separate the code and natural language semantics of a multi-modal code generation prompt. Using the principles of Causal Mediation Analysis on these mediators we quantify direct effects representing the model's spurious leanings. We find that, in addition to natural language instructions, input-output examples significantly influence code generation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.0515

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Greenland (0.04)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Model Guided Self-Debugging Code Generation

Adnan, Muntasir, Xu, Zhiwei, Kuhn, Carlos C. N.

arXiv.org Artificial IntelligenceFeb-5-2025

Automated code generation is gaining significant importance in intelligent computer programming and system deployment. However, current approaches often face challenges in computational efficiency and lack robust mechanisms for code parsing and error correction. In this work, we propose a novel framework, PyCapsule, with a simple yet effective two-agent pipeline and efficient self-debugging modules for Python code generation. PyCapsule features sophisticated prompt inference, iterative error handling, and case testing, ensuring high generation stability, safety, and correctness. Empirically, PyCapsule achieves up to 5.7% improvement of success rate on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench compared to the state-of-art methods. We also observe a decrease in normalized success rate given more self-debugging attempts, potentially affected by limited and noisy error feedback in retention. PyCapsule demonstrates broader impacts on advancing lightweight and efficient code generation for artificial intelligence systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.02928

Country: North America > United States (0.47)

Genre:

Research Report (0.51)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Process-Supervised Reinforcement Learning for Code Generation

Ye, Yufan, Zhang, Ting, Jiang, Wenbin, Huang, Hua

arXiv.org Artificial IntelligenceFeb-3-2025

Existing reinforcement learning strategies based on outcome supervision have proven effective in enhancing the performance of large language models(LLMs) for code generation. While reinforcement learning based on process supervision has shown great promise in handling multi-step reasoning tasks, its effectiveness in code generation remains largely underexplored and underjustified. The primary obstacle stems from the resource-intensive nature of constructing high-quality process-supervised data, which demands substantial human expertise and computational resources. In response to this challenge, we propose a "statement mutation/refactoring-compile and execution verification" strategy: mutating and refactoring code line-by-line through a teacher model, and utilizing compiler execution results to automatically label each line, resulting in line-by-line process-supervised data, which is pivotal for training a process-supervised reward model. The trained reward model is then integrated into the PRLCoder framework, followed by experimental validation on several benchmarks. Experimental results demonstrate that process-supervised reinforcement learning significantly surpasses methods relying solely on outcome supervision. Notably, in tackling complex code generation tasks, process-supervised reinforcement learning shows a clear advantage, ensuring both the integrity of the code generation process and the correctness of the generation results.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2502.01715

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Code Generation as a Dual Task of Code Summarization

Neural Information Processing SystemsJan-27-2025, 18:21:56 GMT

This paper presents an interesting approach of using the duality relationship between Code Summarization (CS) and Code Generation (CG) to improve the performance of a neural model on both tasks simultaneously. The main idea is to exploit the fact that the conditional probability of a comment given some source code, and the conditional probability of source code given a comment, are both related by their common joint probability. Moreover, since both the tasks of CS and CG use an attention-based seq2seq architecture, this paper also proposes to add an additional constraint that the two attention vectors have similar distributions, i.e. the attention weight of comment word i to source token j for the CS task is similar to the attention weights of the same pair for the CG task. The method is evaluated on two datasets of Java and Python programs/comment pairs and the dual training outperforms several baseline methods including the same architecture trained without dual constraints (basic model). Overall, I liked the idea of exploiting the dual relationship between the code summarization and code generation tasks. The proposed dual regularization terms relating to the factorization of conditional probability distributions and similarity of attention matrices are quite elegant.

code generation, code summarization, constraint, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.83)

Add feedback

Reviews: Code Generation as a Dual Task of Code Summarization

Neural Information Processing SystemsJan-27-2025, 18:06:28 GMT

All reviewers liked the dual relationship between the code summarization and code generation tasks. They were also satisfied with the implementation and experiments. These problems are both difficult and important, hence progress is of interest even if such dualities have been identified in other contexts.

code generation, code summarization, dual task

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.82)

Add feedback