Review for NeurIPS paper: Probabilistic Linear Solvers for Machine Learning

Neural Information Processing Systems 

Strengths: EDIT after rebuttal: Thank you authors for clarifying the following: - GP regression on log(Rayleigh_i): satisfactory reply, this algorithm takes into account uncertainty about eigenvalues beyond t 1. - Transfer learning: reusing the posterior covariance as a prior makes the method converge faster than if just the mean is reused. I'm still confused about this, but a little bit less: - Empirical Bayes: is indeed common, and in many applications the prior is updated as more data comes in. For example, in Bayesian optimization, after acquiring an extra point the GP hyperparameters are re-optimized. However, the weird thing here, which the authors clarified in the rebuttal, is that the prior used at each time step *contains future observations in it*. Does this imply that the posterior covariance is impossible to calculate in the middle of the algorithm, before it is terminated and thus we have the full S matrix?