Achieving and Understanding Out-of-Distribution Generalization in Systematic Reasoning in Small-Scale Transformers