Review for NeurIPS paper: Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes

Jan-22-2025, 07:57:06 GMT–Neural Information Processing Systems

Additional Feedback: I'd like to see main paper Figure 1 / supplementary figure 4.1 expanded. The two questions I have that I don't think the figure currently answers are (1) how does the variance in final \sigma {2}_{f} across trials compare to a full batch GP, and (2) if full batch GPs have smaller variance, do much larger batch sizes (e.g., say m 1000) decrease this variance further? In figure 4.1, it does not seem the variance decreases much from m 16 to m 64 -- it'd be nice to know whether the batch size is the source of the variance. If it is, then running with very large batch sizes even up to m 10000 may not be too challenging. To the point of running large batch sizes, while the ability to use SGD will clearly outperform full batch training at some size N (at a guess, probably somewhere in the the N 100k-500k range), I don't think the results in Table 1 are necessarily representative of the settings you might actually want to run sgGP or EGP with.

gaussian process, neurips paper, stochastic gradient descent, (7 more...)

Neural Information Processing Systems

Jan-22-2025, 07:57:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)