General Information Metrics for Improving AI Model Training Efficiency

Xu, Jianfeng, Liu, Congcong, Tan, Xiaoying, Zhu, Xiaojie, Wu, Anpeng, Wan, Huan, Kong, Weijun, Li, Chun, Xu, Hu, Kuang, Kun, Wu, Fei

arXiv.org Artificial Intelligence 

Artificial intelligence (AI) is transforming numerous aspects of contemporary life, with advancements fueled largely by the training of models on extensive datasets (Pouyanfar et al. 2018; S. Dong et al. 2021; Bialkova 2024). This is particularly evident in areas like autonomous driving (S. Liu et al. 2024; C. Cui et al. 2024), generative AI (Feuerriegel et al. 2024; Huang et al. 2024), and medical image processing (Tian et al. 2024; Alzubaidi et al. 2024), which depend on large-scale model training. As these models expand to encompass hundreds of billions of parameters, the need for high-quality training data becomes critical (Zhao et al. 2023; Minaee et al. 2024). Training such large-scale models often requires tens to hundreds of trillions of tokens, substantial interdisciplinary effort over months, and a vast array of computational resources, including thousands of GPUs and high levels of energy consumption (Achiam et al. 2023; Touvron, Lavril, et al. 2023; Touvron, Martin, et al. 2023; Chowdhery et al. 2023). A core challenge is ensuring that training data is meticulously curated--ineffective data selection can yield models that underperform, fall short of desired objectives, and waste considerable resources (Chowdhery et al. 2023; Gunasekar et al. 2023b). Thus, once model architecture and algorithms are defined, the quality of the training data becomes paramount to a model's success, significantly influencing the performance and relevance of AI technologies across various domains (Hamid 2023; Zha et al. 2023).By focusing on data quality, small-scale models can achieve performance comparable to much larger models. For instance, Phi-1.5 achieves performance on par with models 5 times its size, while Phi-2 matches or even surpasses the performance of models 25 times larger(Gunasekar et al. 2023a; Y. Li et al. 2023).