verified training data
Review for NeurIPS paper: Optimal Learning from Verified Training Data
Summary and Contributions: This paper considers a problem of linear regression where some of your inputs are strategically corrupted. In particular, the algorithm is given some vector data x, which it uses to approximate y by w.x for some learned value w. However, an adversary can corrupt the value of x to x in an attempt to make the predicted value closer to z. Given a training set of triples (x,y,z) the objective is to learn a w that does as well as possible despite these corruptions. Formally, the adversary sends the algorithm x that minimizes w.x -z 2 gamma x-x 2. In particular, they try to minimize the error between the prediction and x without making x too far from z.
Review for NeurIPS paper: Optimal Learning from Verified Training Data
This paper focuses on solving least squares regression problem in the setting where some of the inputs are strategically corrupted. The authors show that certain special setting of the general problem, there exists an algorithm that achieves a provably optimal solution. Overall, the results are interesting and target an improved problems. The main concerns are regarding the significance of the results (as the setting targeted here seems to be fairly narrow) and providing a proper context wrt previous work (this hopefully can be remedied in the revision).
Optimal Learning from Verified Training Data
Standard machine learning algorithms typically assume that data is sampled independently from the distribution of interest. In attempts to relax this assumption, fields such as adversarial learning typically assume that data is provided by an adversary, whose sole objective is to fool a learning algorithm. However, in reality, it is often the case that data comes from self-interested agents, with less malicious goals and intentions which lie somewhere between the two settings described above. To tackle this problem, we present a Stackelberg competition model for least squares regression, in which data is provided by agents who wish to achieve specific predictions for their data. Although the resulting optimisation problem is nonconvex, we derive an algorithm which converges globally, outperforming current approaches which only guarantee convergence to local optima.