dbc4d84bfcfe2284ba11beffb853a8c4-AuthorFeedback.pdf

Neural Information Processing Systems 

Note that the theoretical equivalence requires near-zero initialization, gradient flow (small5 learning rate),and alargenumber ofchannels. These require significant computation resource. One of the advantages of kernel methods is that they requirelittle21 computationon a small dataset, which is a very appealing feature for architecture search.