Goto

Collaborating Authors

 concretely



Online Adaptive Methods, Universality and Acceleration

Kfir Y. Levy, Alp Yurtsever, Volkan Cevher

Neural Information Processing Systems

Conversely, adaptive first order methods are very popular in Machine Learning, with AdaGrad, [12],beingthemostprominent methodamongthisclass. AdaGrad isanonlinelearning algorithm which adapts its learning rate using the feedback (gradients) received through the optimization process, and is known to successfully handle noisy feedback.


Appendix

Neural Information Processing Systems

By this way,YoutubeDNN can be compatible with non-sequential recommendation task.SCANN and IPNSW are built on the learned representation of YoutubeDNN.


e8dbeb1c947a30576c699e7f5c73d3e3-Paper-Conference.pdf

Neural Information Processing Systems

However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables andstrictly Gaussian posterior approximations.


Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder

Ji Feng, Qi-Zhi Cai, Zhi-Hua Zhou

Neural Information Processing Systems

Thiscanbe formulated into anon-linear equality constrained optimization problem. Unlike GANs, solving such problem iscomputationally challenging, wethen proposed a simple yet effective procedure to decouple the alternating updates for the two networks for stability. By teaching the perturbation generator to hijacking the training trajectory of the victim classifier, the generator can thus learn to move against thevictim classifier stepbystep.


cf9dc5e4e194fc21f397b4cac9cc3ae9-Paper.pdf

Neural Information Processing Systems

However, the structure of their hidden layer representations is only theoretically well-understood incertain infinite-width limits, inwhichtheserepresentations cannot flexibly adapt tolearn data-dependent features [3-11,24]. Inthe Bayesian setting, these representations are described by fixed, deterministic kernels [3-11].




BMU-MoCo: BidirectionalMomentumUpdate forContinualVideo-LanguageModeling

Neural Information Processing Systems

Different from the original MoCo [19] and its cross-modal versions [15, 33, 35] that utilize momentum update for only momentum encoders to maintain a large consistent queue, our BMU strategy imposes momentum update on both momentum encoders and (video/text) encoders.