Self-Attention as Distributional Projection: A Unified Interpretation of Transformer Architecture

Open in new window