random kitchen sinks as approximation to kernel machine

#artificialintelligence 

The dimension of $w$ does not make sense to me. In order to approximate the kernel function with sufficient accuracy, we need to use a high number of $D$. It gives me a feeling of a high risk of overfitting the model and my model is appeared to be overfitting when I am trying to make use of it. Isn't the true approximation to the kernel machine should be I think the author is trying to fit a linear model in the feature space (as $z(x)$ is the feature map) rather than the standard kernel trick which does not need to evaluate the feature map. But I don't understand why the author do not need to compute the sample average of $K$ (or do something similar to $z$)? The implementation here is also fitting a model of $D$ parameter, no averaging step is done, which makes me quite confusing.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found