the NLLs in the final version of the paper in addition to reporting averages and standard deviations in all of our other

Neural Information Processing Systems 

We agree with all three reviewers that evaluating the predictive variances is important. Finally, we will clarify that SGPR is by (Titsias, 2009) and SVGP is by (Hensman et al., 2013). This has important ramifications, e.g., In contrast, using CG requires exactly 2w exchanges to do a linear solve. We were unaware of Nguyen's paper at submission and we will add this discussion to the paper. We note that the precomputation, like CG, can be run to a specified desired tolerance.