Review for NeurIPS paper: Optimal Prediction of the Number of Unseen Species with Multiplicity

Neural Information Processing Systems 

Additional Feedback: This paper studies a variant of Fisher et al's unseen species problem, namely, predicting the number of new symbols that appears at least \mu times in the future (unobserved) sample of size a \times n on the basis of the existing sample of size n. This extends the results of Orlitsky et al. [22] focusing on \mu 1, the original setting in Fisher et al. The main findings are - Theorem 1: an estimator is constructed using the smoothing technique from [22] that achieves a normalized prediction error of n {-\Omega(1/a)} provided a O(log n/mu) - Theorem 2: a minimax lower bound n {-O(1/a)} is shown, provided a \Omega(log n/mu). Both the construction and the analysis follow closely those in [22]. Namely, the upper bound is obtained by following the recipe of smoothed estimator (by modifying the unbiased estimator) and the analysis uses Poisson sampling and relies on Bessel function to control the bias from cancellation; the lower bound is obtained by a reduction to the support size estimation problem.