Reviewer 1 - Use of mini-batches: in our experiments, we indeed use mini-batches of size B, by sampling B points
–Neural Information Processing Systems
We would like to thank all reviewers for their valuable feedback and comments. Please find our responses below. This is because it predicts an almost uniform distribution. AdaCV aR also has a lower CV aR than ERM (standard SGD). Thank you for observing that.
Neural Information Processing Systems
Oct-2-2025, 00:33:09 GMT