Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?

Open in new window