
Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme,TransEvolve, to bypass costly dot-product attention over multiple stacked layers.