Review for NeurIPS paper: A Bayesian Perspective on Training Speed and Model Selection

Neural Information Processing Systems 

Weaknesses: At Eq. 5, the authors introduce two sampling based estimators of the lower bound (LB). I am not sure why the authors introduced both as an estimator for the LB: The second estimator is an unbiased estimator of the (log) marginal likelihood (ML). Though it could technically be considered a biased estimator of LB, I do not see why it should be introduced as such, since it is the unbiased estimator of the exact value the authors are hoping to approximate. Actually, in the following sentence the authors write that the second estimator's bias decreases as J is increased, which is very much expected, if not almost trivial considering the point above. Another point is that when J 1 the two estimators are algebraically the same, therefore the first one also becomes (a noisy) unbiased estimator of ML.