Pretraining on the Test Set Is All You Need

Sep-13-2023–arXiv.org Artificial Intelligence

Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \textbf{phi-CTNL} (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \textbf{phi-CTNL} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.

pretraining, test set

arXiv.org Artificial Intelligence

Sep-13-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found