General Information Metrics for Improving AI Model Training Efficiency

Xu, Jianfeng, Liu, Congcong, Tan, Xiaoying, Zhu, Xiaojie, Wu, Anpeng, Wan, Huan, Kong, Weijun, Li, Chun, Xu, Hu, Kuang, Kun, Wu, Fei

Jan-1-2025–arXiv.org Artificial Intelligence

Artificial intelligence (AI) is transforming numerous aspects of contemporary life, with advancements fueled largely by the training of models on extensive datasets (Pouyanfar et al. 2018; S. Dong et al. 2021; Bialkova 2024). This is particularly evident in areas like autonomous driving (S. Liu et al. 2024; C. Cui et al. 2024), generative AI (Feuerriegel et al. 2024; Huang et al. 2024), and medical image processing (Tian et al. 2024; Alzubaidi et al. 2024), which depend on large-scale model training. As these models expand to encompass hundreds of billions of parameters, the need for high-quality training data becomes critical (Zhao et al. 2023; Minaee et al. 2024). Training such large-scale models often requires tens to hundreds of trillions of tokens, substantial interdisciplinary effort over months, and a vast array of computational resources, including thousands of GPUs and high levels of energy consumption (Achiam et al. 2023; Touvron, Lavril, et al. 2023; Touvron, Martin, et al. 2023; Chowdhery et al. 2023). A core challenge is ensuring that training data is meticulously curated--ineffective data selection can yield models that underperform, fall short of desired objectives, and waste considerable resources (Chowdhery et al. 2023; Gunasekar et al. 2023b). Thus, once model architecture and algorithms are defined, the quality of the training data becomes paramount to a model's success, significantly influencing the performance and relevance of AI technologies across various domains (Hamid 2023; Zha et al. 2023).By focusing on data quality, small-scale models can achieve performance comparable to much larger models. For instance, Phi-1.5 achieves performance on par with models 5 times its size, while Phi-2 matches or even surpasses the performance of models 25 times larger(Gunasekar et al. 2023a; Y. Li et al. 2023).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jan-1-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.68)
- Europe (1.00)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Law (0.96)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)