Compositional Reasoning with Transformers, RNNs, and Chain of Thought

Neural Information Processing Systems 

It is understood that different neural network architectures are suited to different tasks, but is there always a single best architecture for a given task? We compare the expressive power of transformers, RNNs, and transformers with chain of thought tokens on a simple and natural class of tasks we term Compositional Reasoning Questions (CRQ).