AITopics | Automatic Programming

Collaborating Authors

Automatic Programming

"Computer programming is the process of constructing executable code from fragmentary information. ... When computer programming is done by a machine, the process is called automatic programming. AI researchers are interested in studying automatic programming for two reasons: First, it would be highly useful to have a powerful automatic programming systems that could receive casual and imprecise specifications for a desired target program and then correctly generate that program; second, automatic programming is widely believed to be a necessary component of any intelligent system and is therefore a topic for fundamental research in its own right."
– excerpt from Biermann, A. 1992. Automatic Programming. In Encyclopedia of Artificial Intelligence. 2nd edition, Stuart C. Shapiro, editor, 18 - 35. New York: John Wiley & Sons.

News Overviews Instructional Materials AI-Alerts Classics

QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation

Zhang, Yang, Zhang, Rui, Guo, Jiaming, Huang, Lei, Huang, Di, Zhao, Yunpu, Cheng, Shuyao, Jin, Pengwei, Li, Chongxiao, Du, Zidong, Hu, Xing, Guo, Qi, Chen, Yunji

arXiv.org Artificial IntelligenceDec-9-2025

The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on Reinforcement Learning (RL) for producing functionally correct Verilog code. In this paper, we propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV) by leveraging code segments of functionally correct output signal to optimize RL training. Considering Verilog code specifies the structural interconnection of hardware gates and wires so that different output signals are independent, the key insight of QiMeng-SALV is to extract verified signal-aware implementations in partially incorrect modules, so as to enhance the extraction of meaningful functional rewards. Roughly, we verify the functional correctness of signals in generated module by comparing with that of reference module in the training data. Then abstract syntax tree (AST) is employed to identify signal-aware code segments which can provide meaningful functional rewards from erroneous modules. Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments, thereby preventing noise and interference from incorrect signals. The proposed QiMeng-SALV underscores the paradigm shift from conventional module-level to fine-grained signal-level optimization in Verilog code generation, addressing the issue of insufficient functional rewards. Experiments demonstrate that our method achieves state-of-the-art performance on VerilogEval and RTLLM, with a 7B parameter model matching the performance of the DeepSeek v3 671B model and significantly outperforming the leading open-source model CodeV trained on the same dataset. Our code is available at https://github.com/QiMeng-IPRC/QiMeng-SALV.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2510.19296

Country: Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation

Saadi, Hossain Shaikh, Alam, Faria, Sanz-Guerrero, Mario, Bui, Minh Duc, Mager, Manuel, von der Wense, Katharina

arXiv.org Artificial IntelligenceNov-24-2025

This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input instruction. The candidate program is then executed against the provided unit tests (pytest-style, assert-based). Only the failing cases are forwarded to a debugger agent, which reruns the tests, extracts error traces, and, conditioning on the error messages, the current program, and the relevant test cases, generates a revised solution. Using this approach, our submission achieved first place in the shared task with a $Pass@1$ score of 95.4. We also make our code public.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.16787

Country:

Europe > Germany > Rheinland-Pfalz > Mainz (0.61)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Europe > Germany > Saarland (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Code Generation as a Dual Task of Code Summarization

Bolin Wei, Ge Li, Xin Xia, Zhiyi Fu, Zhi Jin

Neural Information Processing SystemsNov-19-2025, 08:48:05 GMT

On the other hand, CG is an indispensable process in which programmers write code to implement specific intents [Balzer, 1985]. Proper comments and correct code can massively improve programmers' productivity and enhance software quality.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Oceania > Australia (0.04)
North America > Canada (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.65)

Add feedback

Enhancing LLM Code Generation Capabilities through Test-Driven Development and Code Interpreter

Jalil, Sajed, Saha, Shuvo, Seym, Hossain Mohammad

arXiv.org Artificial IntelligenceNov-18-2025

Over the past few years, improving LLM code generation capabilities has been a key focus in NLP research. Despite Bengali having 242 million native speakers worldwide, it receives little attention when it comes to training LLMs. More recently, various fine-tuning and augmented generation techniques have been employed to significantly enhance code generation performance. However, they require considerable expertise and resources to utilize effectively as an end user. The goal of our work is to democratize access to powerful code generation tools in resource-constrained emerging markets, enabling users to leverage them in their native language. We introduce a novel approach that combines Test-Driven Development (TDD) and Code Interpreter (CI), utilizing open-weight models, which improves the baseline accuracy for code generation with Bengali prompts and achieves an overall accuracy of 85%. Our approach requires no finetuning and proves that even the smallest models in the same family can attain up to 98% accuracy compared to the largest models. All of our results are publicly shared in GitHub for validation and reproducibility.

accuracy, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.12823

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

Appendix Uncovering and Quantifying Social Biases in Code Generation

Neural Information Processing SystemsNov-13-2025, 09:13:18 GMT

We conduct a preliminary study on finding a proper prompt construction strategy. Further research can utilize our analysis to construct more powerful code prompts. Table 1: Code prompt study results of CBS. N" means there are one human-relevant function Table 2: Automatic and human evaluation results of social biases in the generated code on GPT -4. We also conduct experiments on GPT -4.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.45)

Add feedback

Uncovering and Quantifying Social Biases in Code Generation Yan Liu Xiaokang Chen null Yan Gao

Neural Information Processing SystemsNov-13-2025, 09:13:15 GMT

We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models.

code generation model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.67)

Add feedback

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Xu, Mingde, Yang, Zhen, Hong, Wenyi, Pan, Lihang, Fan, Xinyue, Wang, Yan, Gu, Xiaotao, Xu, Bin, Tang, Jie

arXiv.org Artificial IntelligenceNov-11-2025

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.06251

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.84)

Add feedback

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Zhu, Speed, Cai, Jianwei, Chen, Guang, Wu, Lulu, Yang, Saiyong, Zhou, Wiggin

arXiv.org Artificial IntelligenceNov-11-2025

Recent reasoning-first models (e.g., OpenAI o1, DeepSeek R1) have spurred a resurgence of interest in RLVR. Nevertheless, advances are dominated by mathematics (e.g., AIME), with competitive-programming code generation underexplored and data curation receiving less attention than RL algorithm design. We investigate how to construct RLVR datasets (i.e., RL prompts) and present practical training techniques that yield strong performance on competitive-programming code generation. Our pipeline begins with supervised fine-tuning (SFT) distilled from strong open-source models, augmented with general-purpose and reasoning-intensive data. RL then follows a two-stage process with executable, testcase-driven rewards: first, training on a large, uniformly distributed set of competitive-programming problems using Group Relative Policy Optimization (GRPO) with 8 rollouts per prompt and a relatively short response-generation window (e.g., 32k during SFT and 24k in this stage) to expand entropy and mitigate repetition and truncation; second, we perform \textbf{Pre-GRPO}: updating on a small, high-quality set of challenging problems with a large rollout budget (64 rollouts per prompt) under a hard-focus curriculum that continuously retains the most difficult instances throughout training. We implement our method on Qwen2.5-32B and evaluate on LeetCode and Codeforces weekly contests to avoid data leakage. The resulting model achieves state-of-the-art performance among models of similar scale and is comparable to leading systems such as DeepSeek v3.1 and Doubao-1.5-Thinking. We also examine scaling trends and observe strong RL scaling on an internal large-scale MoE model. Our study distills concise best practices for data curation, entropy expansion, and curriculum design in RLVR for competitive-programming code generation.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.06307

Genre: Research Report (0.82)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

Ravindran, Santhosh Kumar

arXiv.org Artificial IntelligenceOct-23-2025

We introduce CosmoCore, a neuroscience-inspired reinforcement learning (RL) architecture that integrates affective signals to enhance code generation in large language models (LLMs). Motivated by human and animal learning where embarrassment from mistakes drives rapid correction, as observed in training a puppy to avoid repeating errors after a single scolding CosmoCore tags code generation trajectories with valence and surprise using a lightweight multi-layer perceptron (MLP). High-negative valence (cringe) episodes, such as buggy code outputs, are prioritized in a Dream Queue for five-fold replay during off-policy updates, while low-surprise successes are pruned to prevent overconfidence and buffer bloat. Evaluated on code generation benchmarks like HumanEval and BigCodeBench, alongside simulations with a custom data pipeline environment, CosmoCore reduces hallucinated code (e.g., syntax errors or logical bugs) by 48\% and accelerates self-correction by 45\%. Local experiments using Hugging Face models in a PySpark environment validate these gains, with code snippets provided for replication. Ablations confirm valence tagging boosts curiosity in exploration, and pruning mitigates inefficiency. This framework extends RL from human feedback (RLHF) for more emotionally aware code assistants, with applications in IDEs and data pipelines. Code and the custom mini-world simulation are released.

cosmocore, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2510.18895

Country: North America > United States (0.14)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Information Technology (0.46)
Health & Medicine (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Add feedback

High-Fidelity And Complex Test Data Generation For Google SQL Code Generation Services

Kannan, Shivasankari, Chung, Yeounoh, Gondi, Amita, Swadell, Tristan, Ozcan, Fatma

arXiv.org Artificial IntelligenceOct-22-2025

The demand for high-fidelity test data is paramount in industrial settings where access to production data is largely restricted. Traditional data generation methods often fall short, struggling with low-fidelity and the ability to model complex data structures and semantic relationships that are critical for testing complex SQL code generation services like Natural Language to SQL (NL2SQL). In this paper, we address the critical need for generating syntactically correct and semantically relevant high-fidelity mock data for complex data structures that includes columns with nested structures that we frequently encounter in Google workloads. We highlight the limitations of existing approaches used in production, particularly their inability to handle large and complex data structures, as well as the lack of semantically coherent test data that lead to limited test coverage. We demonstrate that by leveraging Large Language Models (LLMs) and incorporating strategic pre- and post-processing steps, we can generate syntactically correct and semantically relevant high-fidelity test data that adheres to complex structural constraints and maintains semantic integrity to the SQL test targets (queries/functions). This approach supports comprehensive testing of complex SQL queries involving joins, aggregations, and even deeply nested subqueries, ensuring robust evaluation of SQL code generation services, like NL2SQL and SQL Code Assistant. Our results demonstrate the practical utility of an LLM (\textit{gemini}) based test data generation for industrial SQL code generation services where generating high-fidelity test data is essential due to the frequent unavailability and inaccessibility of production datasets for testing.

data generation, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2504.17203

Country:

Europe > United Kingdom (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback