Intra-Layer Recurrence in Transformers for Language Modeling

Open in new window