Goto

Collaborating Authors

 Africa




Sub-LinearMemory: HowtoMakePerformersSLiM

Neural Information Processing Systems

Recent works proposed various linear self-attention mechanisms, scaling only asO(L)for serial computation. We conduct a thorough complexity analysis of Performers,aclass which includes most recent linear Transformer mechanisms.






Network-to-NetworkRegularization: Enforcing Occam'sRazortoImproveGeneralization

Neural Information Processing Systems

What makes a classifier have the ability to generalize? There have been a lot of important attempts to address this question, but a clear answer is still elusive.