Goto

Collaborating Authors

 random kitchen sink


Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

Neural Information Processing Systems

Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. Why is the net wired randomly?'' Sussman replied,I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. Why do you close your eyes?''


Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

Rahimi, Ali, Recht, Benjamin

Neural Information Processing Systems

Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. Why is the net wired randomly?'' Sussman replied, I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes.


Over Parameterized Two-level Neural Networks Can Learn Near Optimal Feature Representations

Fang, Cong, Dong, Hanze, Zhang, Tong

arXiv.org Machine Learning

Recently, over-parameterized neural networks have been extensively analyzed in the literature. However, the previous studies cannot satisfactorily explain why fully trained neural networks are successful in practice. In this paper, we present a new theoretical framework for analyzing over-parameterized neural networks which we call neural feature repopulation. Our analysis can satisfactorily explain the empirical success of two level neural networks that are trained by standard learning algorithms. Our key theoretical result is that in the limit of infinite number of hidden neurons, over-parameterized two-level neural networks trained via the standard (noisy) gradient descent learns a well-defined feature distribution (population), and the limiting feature distribution is nearly optimal for the underlying learning task under certain conditions. Empirical studies confirm that predictions of our theory are consistent with the results observed in real practice.


random kitchen sinks as approximation to kernel machine

#artificialintelligence

The dimension of $w$ does not make sense to me. In order to approximate the kernel function with sufficient accuracy, we need to use a high number of $D$. It gives me a feeling of a high risk of overfitting the model and my model is appeared to be overfitting when I am trying to make use of it. Isn't the true approximation to the kernel machine should be I think the author is trying to fit a linear model in the feature space (as $z(x)$ is the feature map) rather than the standard kernel trick which does not need to evaluate the feature map. But I don't understand why the author do not need to compute the sample average of $K$ (or do something similar to $z$)? The implementation here is also fitting a model of $D$ parameter, no averaging step is done, which makes me quite confusing.


Fastfood: Approximate Kernel Expansions in Loglinear Time

Le, Quoc Viet, Sarlos, Tamas, Smola, Alexander Johannes

arXiv.org Machine Learning

Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matrices, when combined with diagonal Gaussian matrices, exhibit properties similar to dense Gaussian random matrices. Yet unlike the latter, Hadamard and diagonal matrices are inexpensive to multiply and store. These two matrices can be used in lieu of Gaussian matrices in Random Kitchen Sinks proposed by Rahimi and Recht (2009) and thereby speeding up the computation for a large range of kernel functions. Specifically, Fastfood requires O(n log d) time and O(n) storage to compute n non-linear basis functions in d dimensions, a significant improvement from O(nd) computation and storage, without sacrificing accuracy. Our method applies to any translation invariant and any dot-product kernel, such as the popular RBF kernels and polynomial kernels. We prove that the approximation is unbiased and has low variance. Experiments show that we achieve similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory. These improvements, especially in terms of memory usage, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.


Randomized co-training: from cortical neurons to machine learning and back again

Balduzzi, David

arXiv.org Machine Learning

Despite its size and complexity, the human cortex exhibits striking anatomical regularities, suggesting there may simple meta-algorithms underlying cortical learning and computation. We expect such meta-algorithms to be of interest since they need to operate quickly, scalably and effectively with little-to-no specialized assumptions. This note focuses on a specific question: How can neurons use vast quantities of unlabeled data to speed up learning from the comparatively rare labels provided by reward systems? As a partial answer, we propose randomized co-training as a biologically plausible meta-algorithm satisfying the above requirements. As evidence, we describe a biologically-inspired algorithm, Correlated Nystrom Views (XNV) that achieves state-of-the-art performance in semi-supervised learning, and sketch work in progress on a neuronal implementation.


Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning

Rahimi, Ali, Recht, Benjamin

Neural Information Processing Systems

Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. ``What are you doing?'' asked Minsky. ``I am training a randomly wired neural net to play tic-tac-toe,'' Sussman replied. ``Why is the net wired randomly?'' asked Minsky. Sussman replied, ``I do not want it to have any preconceptions of how to play.'' Minsky then shut his eyes. ``Why do you close your eyes?'' Sussman asked his teacher. ``So that the room will be empty,'' replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities.