Insights into Pre-training via Simpler Synthetic Tasks

May-27-2025, 13:37:44 GMT–Neural Information Processing Systems

Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of 67\% of the benefits of natural pre-training.

downstream task, pre-training, simpler synthetic task, (3 more...)

Neural Information Processing Systems

May-27-2025, 13:37:44 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)