Reviews: Generalization Properties of Learning with Random Features
–Neural Information Processing Systems
This is in my opinion an excellent paper, a significant theoretical contribution to understanding the role of the well established random feature trick in kernel methods. The authors prove that for a wide range of optimization tasks in machine learning random feature based methods provide algorithms giving results competitive (in terms of accuracy) to standard kernel methods with only \sqrt{n} random features (instead of linear number; this provides scalability). This is according to my knowledge, one of the first result where it is rigorously proven that for downstream applications (such as kernel ridge regression) one can use random feature based kernel methods with relatively small number of random features (the whole point of using the random feature approach is to use significantly fewer random features than the dimensionality of a data). So far most guarantees were of point-wise flavor (there are several papers giving upper bounds on the number of random features needed to approximate the value of the kernel accurately for a given pair of feature vectors x and y but it is not clear at all how these guarantees translate for instance to risk guarantees for downstream applications). The authors however miss one paper with very relevant results that it would be worth to compare with theirs.
Neural Information Processing Systems
Oct-7-2024, 22:41:30 GMT
- Technology: