Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme,TransEvolve, to bypass costly dot-product attention over multiple stacked layers.
While recent NLP evaluation benchmark tasks test some aspects of human-imitative behavior (e.g., BIG-bench's'human-like behavior' tasks), few, if not none, examine creative problem solving abilities.
We propose a new perspective to reconsider theFourier transform from abasis functions perspective. Specifically, the real and imaginary parts of the frequency components can be viewed as the coefficients of cosine and sine basis functions at tiered frequency levels, respectively.