BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data

Open in new window