Goto

Collaborating Authors

 duetal


dbc4d84bfcfe2284ba11beffb853a8c4-AuthorFeedback.pdf

Neural Information Processing Systems

Note that the theoretical equivalence requires near-zero initialization, gradient flow (small5 learning rate),and alargenumber ofchannels. These require significant computation resource. One of the advantages of kernel methods is that they requirelittle21 computationon a small dataset, which is a very appealing feature for architecture search.


An Improved Analysis of Training Over-parameterized Deep Neural Networks

Difan Zou, Quanquan Gu

Neural Information Processing Systems

Arecent lineofresearch hasshownthatgradient-based algorithms withrandom initialization can converge to the global minima of the training loss for overparameterized (i.e.,sufficiently wide)deepneuralnetworks. However,thecondition onthewidth oftheneural networktoensure theglobal convergence isvery stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)).





AnExponentialLowerBoundforLinearly-Realizable MDPswithConstantSuboptimalityGap

Neural Information Processing Systems

A fundamental question in the theory of reinforcement learning is: suppose the optimalQ-function lies inthe linear span ofagivenddimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providinganexponential(ind)samplesizelowerbound,whichholdsevenifthe agent has access to a generative model of the environment. One may hope that such a lower can be circumvented with an even stronger assumption that there isaconstant gapbetween the optimalQ-value ofthe best action and that ofthe second-best action (for allstates); indeed, the construction inWeisz etal.