Goto

Collaborating Authors

 transevolve



RedesigningtheTransformerArchitecturewith InsightsfromMulti-particleDynamicalSystems

Neural Information Processing Systems

Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme,TransEvolve, to bypass costly dot-product attention over multiple stacked layers.