zoea
Shrinking the Inductive Programming Search Space with Instruction Subsets
Inductive programming frequently relies on some form of search in order to identify candidate solutions. However, the size of the search space limits the use of inductive programming to the production of relatively small programs. If we could somehow correctly predict the subset of instructions required for a given problem then inductive programming would be more tractable. We will show that this can be achieved in a high percentage of cases. This paper presents a novel model of programming language instruction co-occurrence that was built to support search space partitioning in the Zoea distributed inductive programming system. This consists of a collection of intersecting instruction subsets derived from a large sample of open source code. Using the approach different parts of the search space can be explored in parallel. The number of subsets required does not grow linearly with the quantity of code used to produce them and a manageable number of subsets is sufficient to cover a high percentage of unseen code. This approach also significantly reduces the overall size of the search space - often by many orders of magnitude.
Architecture and Knowledge Representation for Composable Inductive Programming
We present an update on the current architecture of the Zoea knowledge-based, Composable Inductive Programming system. The Zoea compiler is built using a modern variant of the black-board architecture. Zoea integrates a large number of knowledge sources that encode different aspects of programming language and software development expertise. We describe the use of synthetic test cases as a ubiquitous form of knowledge and hypothesis representation that sup-ports a variety of reasoning strategies. Some future plans are also outlined.
The Composability of Intermediate Values in Composable Inductive Programming
It is believed that mechanisms including intermediate values enable composable inductive programming (CIP) to be used to produce software of any size. We present the results of a study that investigated the relationships between program size, the number of intermediate values and the number of test cases used to specify programs using CIP. In the study 96,000 programs of various sizes were randomly generated, decomposed into fragments and transformed into test cases. The test cases were then used to regenerate new versions of the original programs using Zoea. The results show linear relationships between the number of intermediate values and regenerated program size, and between the number of test cases and regenerated program size within the size range studied. In addition, as program size increases there is increasing scope for trading off the number of test cases against the number of intermediate values and vice versa.
Zoea -- Composable Inductive Programming Without Limits
The abstraction levels represent a general progression from the test cases through available and derived values to partial and complete solutions. The abstraction levels include: - test cases; - input and output elements; - derived values (symbolic and numeric); - code fragments; - target values; - case solutions; - case set solutions; - program solutions; - solution code. The data on the blackboard represents a set of more or less promising solution fragments at different stages of identification, characterisation and elaboration. It is worth noting that progression from test cases to solution code is not a strictly linear process. Instead knowledge sources respond to changes at one or more specific abstraction levels to produce, enhance or remove elements on different levels. The blackboard model allows this to happen in more or less any order.