Reviews: Coresets for Scalable Bayesian Logistic Regression
–Neural Information Processing Systems
Note that in minibatch inference methods, at each iteration, a small subset of the data is sampled from the full dataset and used to make an update; these methods take advantage of the redundancy in data to perform inexpensive updates. In this paper, coresets reduce the total dataset size by, in some sense, approximating the dataset with a smaller group of (weighted) examples. However, when coresets are used in existing inference algorithms (such as these minibatch algorithms), it seems to me that a very similar procedure will occur: a small subset of this approximate, weighted dataset will be drawn, and used to make an update. I am not convinced this would actually speed up inference (i.e. In a sense, I feel that the main thing happening here is that the data is approximated in a smaller/compressed fashion; I can see how this might help with data storage concerns, but I don't see a great justification for why it would appreciably speed inference over existing minibatch methods (especially considering a coreset must be constructed before inference can proceed, which adds additional inference time to this method). One way to demonstrate this would be with timing comparison plots that explicitly show that coresets yield faster inferences given large datasets when compared to minibatch methods---however, no direct experiments of this sort are given.
Neural Information Processing Systems
Jan-20-2025, 08:34:45 GMT