Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?

Open in new window