RedesigningtheTransformerArchitecturewith InsightsfromMulti-particleDynamicalSystems

Neural Information Processing Systems 

Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme,TransEvolve, to bypass costly dot-product attention over multiple stacked layers.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found