Domain Pre-training Impact on Representations
Gonzalez-Gutierrez, Cesar, Quattoni, Ariadna
–arXiv.org Artificial Intelligence
This empirical study analyzes the effects of the pre-training corpus on the quality of learned transformer representations. We focus on the representation quality induced solely through pre-training. Our experiments show that pre-training on a small, specialized corpus can yield effective representations, and that the success of combining a generic and a specialized corpus depends on the distributional similarity between the target task and the specialized corpus.
arXiv.org Artificial Intelligence
Jun-2-2025
- Country:
- Asia
- China > Hong Kong (0.05)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Germany > Berlin (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Cuba > Artemisa Province
- Artemisa (0.04)
- Dominican Republic (0.04)
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Oregon > Multnomah County
- Portland (0.04)
- Washington > King County
- Seattle (0.04)
- Louisiana > Orleans Parish
- Canada > Ontario
- Oceania > Australia
- Asia
- Genre:
- Research Report (0.82)
- Technology: