input-output example
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
- Semiconductors & Electronics (0.69)
- Information Technology (0.67)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Greece > Ionian Islands > Corfu (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Program Synthesis with Pragmatic Communication
Program synthesis techniques construct or infer programs from user-provided specifications, such as input-output examples. Yet most specifications, especially those given by end-users, leave the synthesis problem radically ill-posed, because many programs may simultaneously satisfy the specification.
ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries
Yuviler, Tom, Drachsler-Cohen, Dana
Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to identify the correct program, either because they fail to distinguish nonequivalent programs or because they rely on an LLM and assume it always correctly determines the output for every input. We present ExPairT-LLM, an exact learning algorithm for code selection that selects a program by posing two new types of queries to an LLM oracle: pairwise membership and pairwise equivalence. These queries are simpler for LLMs and enable ExPairT-LLM to identify the correct program through a tournament, which is robust to some LLM mistakes. We evaluate ExPairT-LLM on four popular code datasets. Its pass@1 (success rate) outperforms the state-of-the-art code selection algorithm on average by +13.0% and up to +27.1%. It also improves the pass@1 of LLMs performing complex reasoning by +24.0%.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
- Semiconductors & Electronics (0.69)
- Information Technology (0.67)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Cognitive Science (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.47)
program synthesis from input-output examples, which typically assumes that the number of input-output examples is
We would like to thank all three reviewers for their thoughtful comments. R3 saw our approach as "very similar to the standard approach for neural We believe our model actually differs significantly from previous approaches in this regard. While our code is able to perform a "double attention" mechanism, this work does not use these features of We thank R1 and apologize for this confusion. According to R2, our paper "shows quite convincingly that neural program Our revision will report this experiment and move the discussion on the heuristics to the main text. Our approach utilizes test-time search, which R3 also suggests is a disadvantage: "The results of [the no search In that sense, our approach offers more robustness than a neural-only model would allow. The reviewers note that our model uses strong supervision in the form of a meta-grammar. In a sense, we agree with R2: "Now that this paper has shown This demonstrates both generalization and graceful degradation on grammars with 3x the number of rules vs training.