Two Heads Are Better than One: Simulating Large Transformers with Small Ones

Open in new window