PeaTMOSS: Mining Pre-Trained Models in Open-Source Software

Jiang, Wenxin, Jones, Jason, Yasmin, Jerin, Synovic, Nicholas, Sashti, Rajeev, Chen, Sophie, Thiruvathukal, George K., Tian, Yuan, Davis, James C.

Oct-5-2023–arXiv.org Artificial Intelligence

Developing and training deep learning models is expensive, so software engineers have begun to reuse pre-trained deep learning models (PTMs) and fine-tune them for downstream tasks. Despite the wide-spread use of PTMs, we know little about the corresponding software engineering behaviors and challenges. To enable the study of software engineering with PTMs, we present the PeaTMOSS dataset: Pre-Trained Models in Open-Source Software. PeaTMOSS has three parts: a snapshot of (1) 281,638 PTMs, (2) 27,270 open-source software repositories that use PTMs, and (3) a mapping between PTMs and the projects that use them. We challenge PeaTMOSS miners to discover software engineering practices around PTMs. A demo and link to the full dataset are available at: https://github.com/PurdueDualityLab/PeaTMOSS-Demos.

mining pre-trained model, open-source software, peatmoss

arXiv.org Artificial Intelligence

Oct-5-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Software (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found