Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

Open in new window