Goto

Collaborating Authors

 ininternational conferenceon learning representation





On Exact Computation with an Infinitely Wide Neural Net

Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Russ R. Salakhutdinov, Ruosong Wang

Neural Information Processing Systems

Moreo randominitializationH( 0)conv deterministic H asthewidthNeur ker ( , ) (Equation (2)) evaluatedH(t)= H forallt, then (3) becomes du(t) dt = H (u(t) y). Suppose (z)= max ( 0,z), 1/ = poly ( 1/ ,log (n / )) and d1 = d2 = = dL = m with m poly ( 1/ , L,1/ 0,n,log ( 1/ )).


Practical Deep Learning with Bayesian Principles

Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz E. Khan, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota

Neural Information Processing Systems

Figure 2: distributed calculation algorithmic Momentum Itiswell improv to Adam, where 1isthemomentumµin in Adaminit.xavier_normalin V methods, and AUR andissecond-best significantly and Adam Wealsosho7] in Figures itscalibration ImageNet, required Wealso different protocol 16,31,8,32] tocompare Wealsoborro16,30], sho reporting Ideally, we data.



Exponential Family Estimation via Adversarial Dynamics Embedding

Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

Neural Information Processing Systems

Theorem 1 (Fencheldualoflog-partition (Wainwrightand Jordan,2008)) Let H(q): = R q(x) logq(x)dx. The C. Compared optimization Goodfello, 2014; Arjovsk, 2017; Dai, 2017), thereversalmin-maxin (20), themajor sharesparameters updatesofthe accelerating learnedadv empirically 5. Similaroptimization(13) with (17).