Two Heads Are Better than One: Simulating Large Transformers with Small Ones