Enhancing Latent Computation in Transformers with Latent Tokens

Open in new window