Reviews: Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization

Neural Information Processing Systems 

A) My main concern with this paper is with respect to the main results (Theorems 2 and 3). It seems the authors have not put sufficient care to the fact that \partial \hat{Q} in Algorithm 2 is a biased estimator of the true gradient \partial Q. Also, \hat{Q} defined in Line 189 depends on \hat{\pi} which is an estimate of \pi. Thus, a probabilistic proof would require to look at a conditional probability of the estimation of Q depending on the estimation of \pi. B) Regardless of the above, the final high probability statement in Theorems 2 and 3, seem to be missing the union bound of the error probability in Assumption 1.