partial program
Review for NeurIPS paper: Learning Differentiable Programs with Admissible Neural Heuristics
Summary and Contributions: This work considers the problem of synthesizing programs from input/outputs, but where some of the components of the program might have continuous parameters, and where the entire program is differentiable with respect to these parameters. Neurosymbolic programs are a special case of this set up (symbolic programs which can call out to neural modules if needed). This is an especially challenging combinatorial search problem, because not only do we have to consider an infinitely large, discrete space of program structures, but we also have to consider an inner-loop optimization over continuous parameters. The approach they take is to perform an explicit symbolic graph search over the discrete space of partial programs. As a heuristic function for this graph search, they train neural networks to approximate the behavior of incomplete portions of the program syntax tree.
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
Program synthesis methods, whether formal or neural-based, lack fine-grained control and flexible modularity, which limits their adaptation to complex software development. These limitations stem from rigid Domain-Specific Language (DSL) frameworks and neural network incorrect predictions. To this end, we propose the Chain of Logic (CoL), which organizes the synthesis process into an activity flow and provides heuristic control to guide the process. Furthermore, by integrating neural networks with libraries and introducing a Neural Network Feedback Control (NNFC) mechanism, our approach modularizes synthesis and mitigates the impact of neural network mispredictions. Experiments on relational and symbolic synthesis tasks show that CoL significantly enhances the efficiency and reliability of DSL program synthesis across multiple metrics. Specifically, CoL improves accuracy by 70% while reducing tree operations by 91% and time by 95%. Additionally, NNFC further boosts accuracy by 6%, with a 64% reduction in tree operations under challenging conditions such as insufficient training data, increased difficulty, and multidomain synthesis. These improvements confirm COOL as a highly efficient and reliable program synthesis framework.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
Brandfonbrener, David, Henniger, Simon, Raja, Sibi, Prasad, Tarun, Loughridge, Chloe, Cassano, Federico, Hu, Sabrina Ruixin, Yang, Jianang, Byrd, William E., Zinkov, Robert, Amin, Nada
Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound. In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq. VerMCTS uses a logical verifier in concert with an LLM to guide a modified Monte Carlo Tree Search (MCTS). This approach leverages the verifier to gain intermediate feedback inside the search algorithm by checking partial programs at each step to estimate an upper bound on the value function. To measure the performance of VerMCTS, we develop a new suite of multi-step verified programming problems in Dafny and Coq. In terms of pass@T, a new metric which computes the pass rate given a budget of T tokens sampled from the LLM, VerMCTS leads to more than a 30% absolute increase in average pass@5000 across the suite over repeated sampling from the base language model.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Alabama (0.04)
- (3 more...)
Programming-by-Demonstration for Long-Horizon Robot Tasks
Patton, Noah, Rahmani, Kia, Missula, Meghana, Biswas, Joydeep, Dillig, Işil
The goal of programmatic Learning from Demonstration (LfD) is to learn a policy in a programming language that can be used to control a robot's behavior from a set of user demonstrations. This paper presents a new programmatic LfD algorithm that targets long-horizon robot tasks which require synthesizing programs with complex control flow structures, including nested loops with multiple conditionals. Our proposed method first learns a program sketch that captures the target program's control flow and then completes this sketch using an LLM-guided search procedure that incorporates a novel technique for proving unrealizability of programming-by-demonstration problems. We have implemented our approach in a new tool called PROLEX and present the results of a comprehensive experimental evaluation on 120 benchmarks involving complex tasks and environments. We show that, given a 120 second time limit, PROLEX can find a program consistent with the demonstrations in 80% of the cases. Furthermore, for 81% of the tasks for which a solution is returned, PROLEX is able to find the ground truth program with just one demonstration. In comparison, CVC5, a syntax guided synthesis tool, is only able to solve 25% of the cases even when given the ground truth program sketch, and an LLM-based approach, GPT-Synth, is unable to solve any of the tasks due to the environment complexity.
- North America > United States > Texas (0.14)
- Europe > Germany (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California (0.14)
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.48)
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors
Phung, Tung, Pădurean, Victor-Alexandru, Cambronero, José, Gulwani, Sumit, Kohn, Tobias, Majumdar, Rupak, Singla, Adish, Soares, Gustavo
Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies for introductory programming. Recent works have studied these models for different scenarios relevant to programming education; however, these works are limited for several reasons, as they typically consider already outdated models or only specific scenario(s). Consequently, there is a lack of a systematic study that benchmarks state-of-the-art models for a comprehensive set of programming education scenarios. In our work, we systematically evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with human tutors for a variety of scenarios. We evaluate using five introductory Python programming problems and real-world buggy programs from an online platform, and assess performance using expert-based annotations. Our results show that GPT-4 drastically outperforms ChatGPT (based on GPT-3.5) and comes close to human tutors' performance for several scenarios. These results also highlight settings where GPT-4 still struggles, providing exciting future directions on developing techniques to improve the performance of these models.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)
PAC Prediction Sets for Large Language Models of Code
Khakhar, Adam, Mell, Stephen, Bastani, Osbert
Prediction sets have recently been shown to be a promising strategy for quantifying the uncertainty of deep neural networks in a way that provides theoretical guarantees. However, existing techniques have largely targeted settings where the space of labels is simple, so prediction sets can be arbitrary subsets of labels. For structured prediction problems where the space of labels is exponential in size, even prediction sets containing a small fraction of all labels can be exponentially large. In the context of code generation, we propose a solution that considers a restricted set of prediction sets that can compactly be represented as partial programs, which are programs with portions replaced with holes. Given a trained code generation model, our algorithm leverages a programming language's abstract syntax tree to generate a set of programs such that the correct program is in the set with high-confidence. Valuable applications of our algorithm include a Codex-style code generator with holes in uncertain parts of the generated code, which provides a partial program with theoretical guarantees. We evaluate our approach on PICARD (a T5 model for SQL semantic parsing) and Codex (a GPT model for over a dozen programming languages, including Python), demonstrating that our approach generates compact PAC prediction sets. This is the first research contribution that generates PAC prediction sets for generative code models.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Improved Tree Search for Automatic Program Synthesis
However, as reported by previous work (Zohar & Wolf, 2018; Chen et al., 2019), employing a reinforcement learning In the task of automatic program synthesis, one approach, as opposed to training using a maximum likelihood obtains pairs of matching inputs and outputs and loss to generate the single program that is available generates a computer program, in a particular as the ground truth, either hurts performance or leads to a domain-specific language (DSL), which given small increase in performance. This is despite training the each sample input returns the matching output. A MLE approach in a teacher-forcing way, in which, during key element is being able to perform an efficient training and unlike during test time, the partial programs search in the space of valid programs. Here, we considered are the prefix of the ground truth programs.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (16 more...)
Planning with Large Language Models for Code Generation
Zhang, Shun, Chen, Zhenfang, Shen, Yikang, Ding, Mingyu, Tenenbaum, Joshua B., Gan, Chuang
Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process. Although the programs they generate achieve high token-matching-based scores, they often fail to compile or generate incorrect outputs. The main reason is that conventional Transformer decoding algorithms may not be the best choice for code generation. In this work, we propose a novel Transformer decoding algorithm, Planning-Guided Transformer Decoding (PG-TD), that uses a planning algorithm to do lookahead search and guide the Transformer to generate better programs. Specifically, instead of simply optimizing the likelihood of the generated sequences, the Transformer makes use of a planner to generate candidate programs and test them on public test cases. The Transformer can therefore make more informed decisions and generate tokens that will eventually lead to higher-quality programs. We also design a mechanism that shares information between the Transformer and the planner to make our algorithm computationally efficient. We empirically evaluate our framework with several large language models as backbones on public coding challenge benchmarks, showing that 1) it can generate programs that consistently achieve higher performance compared with competing baseline methods; 2) it enables controllable code generation, such as concise codes and highly-commented codes by optimizing modified objective.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Scaling Neural Program Synthesis with Distribution-based Search
Fijalkow, Nathanaël, Lagarde, Guillaume, Matricon, Théo, Ellis, Kevin, Ohlmann, Pierre, Potta, Akarsh
We consider the problem of automatically constructing computer programs from input-output examples. We investigate how to augment probabilistic and neural program synthesis methods with new search algorithms, proposing a framework called distribution-based search. Within this framework, we introduce two new search algorithms: Heap Search, an enumerative method, and SQRT Sampling, a probabilistic method. We prove certain optimality guarantees for both methods, show how they integrate with probabilistic and neural techniques, and demonstrate how they can operate at scale across parallel compute environments. Collectively these findings offer theoretical and applied studies of search algorithms for program synthesis that integrate with recent developments in machine-learned program synthesizers.
- Europe > France > Île-de-France > Paris > Paris (0.14)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Europe > United Kingdom (0.04)
- (2 more...)
Representing Partial Programs with Blended Abstract Semantics
Nye, Maxwell, Pu, Yewen, Bowers, Matthew, Andreas, Jacob, Tenenbaum, Joshua B., Solar-Lezama, Armando
Synthesizing programs from examples requires searching over a vast, combinatorial space of possible programs. In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next. We introduce a general technique for representing partially written programs in a program synthesis engine. We take inspiration from the technique of abstract interpretation, in which an approximate execution model is used to determine if an unfinished program will eventually satisfy a goal specification. Here we learn an approximate execution model implemented as a modular neural network. By constructing compositional program representations that implicitly encode the interpretation semantics of the underlying programming language, we can represent partial programs using a flexible combination of concrete execution state and learned neural representations, using the learned approximate semantics when concrete semantics are not known (in unfinished parts of the program). We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs, such as loops and higher-order functions, and can be used to synthesize programs more accurately for a given search budget than pure neural approaches in several domains.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)