Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors

Open in new window