What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture
Chang, Heng-Sheng, Mehta, Prashant G.
–arXiv.org Artificial Intelligence
In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilistic model that interprets transformer signals as surrogates of conditional measures, and layer operations as fixed-point updates. An explicit form of the fixed-point update is described for the special case when the probabilistic model is a hidden Markov model (HMM). In part, this paper is in an attempt to bridge the classical nonlinear filtering theory with modern inference architectures.
arXiv.org Artificial Intelligence
Aug-29-2025