Reviews: But How Does It Work in Theory? Linear SVM with Random Features

Neural Information Processing Systems 

The authors analyze the use of random Fourier Features (RFF) for training linear support vector machines in the RFF feature space. Their result bounds the expected risk difference of the hinge loss (i.e. the generalization error on unseen data) for a number of different scenarios. The analysis is based on a number of assumptions, on the noise distribution, on the RKHS containing the optimal classifier, and most crucially on access to the optimal feature weights by Bach 2017. The first main result is a fast rate for kernels whose spectrum decays polynomially when the Bayes classifier is in the feature space. The second result is a fast rate specifically for the Gaussian kernel when the Bayes classifier is not necessarily in the feature space, but the data is separable by some minimum distance.