Reviews: Write, Execute, Assess: Program Synthesis with a REPL

Neural Information Processing Systems 

"Given a large enough time budget the'no REPL' baseline is competitive with our ablated alternatives." However, the policy rollout baseline is trained with RL using a single machine, making it difficult to explore using entropy based methods or epsilon greedy. However, using multiple actors in an asynchronous setting would be a stronger/fairer baseline (and then doing policy rollouts) to the SMC approach. I expect SMC to do well but this is an important empirical question (other methods cited like Ganin et al. seem to do this in the same context). "The value-guided SMC sampler leads to the highest overall number of correct programs, requiring less time and fewer nodes expanded compared to other inference techniques. " - how well does a SMC sampler work without value guided proposals for both case studies?