Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Aboagye, Prince, Zheng, Yan, Wang, Junpeng, Saini, Uday Singh, Dai, Xin, Yeh, Michael, Fan, Yujie, Zhuang, Zhongfang, Jain, Shubham, Wang, Liang, Zhang, Wei

Jan-15-2024–arXiv.org Artificial Intelligence

The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and effectively. In this study, we explore a novel approach where we leverage the metafeatures associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models, and image models. Pre-training on large models is becoming increasingly common in various machine learning applications, thanks to the growing amount of user-generated content. This is evident in areas such as Natural Language Processing (NLP) with models like GPT (Generative Pretrained Transformer), and in the vision-language domain with models like CLIP. Typically, the effectiveness of these models is evaluated using downstream tasks. However, these can be relatively costly if all tasks need to be performed.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jan-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)