Reviews: Sampling for Bayesian Program Learning

Neural Information Processing Systems 

I found this paper interesting and well-written, but I have some significant questions and comments about the approach. The paper argues that sampling is useful because we can find the C most frequently sampled programs and show them to a user. As shown in Figure 6, there is more likely to be a correct program in the top 3 programs than in the top 1. But if we want to show the top C programs, do we really need to perform sampling, which the paper says is complicated by the existence of many long and unlikely programs that match the training examples? Why can't we simply find the MDL program and then run the solver again with length restrictions to find other consistent programs of the same length, or slightly longer lengths?