model repository
Charting and Navigating Hugging Face's Model Atlas
Horwitz, Eliahu, Kurer, Nitzan, Kahana, Jonathan, Amar, Liel, Hoshen, Yedid
As there are now millions of publicly available neural networks, searching and analyzing large model repositories becomes increasingly important. Navigating so many models requires an atlas, but as most models are poorly documented charting such an atlas is challenging. To explore the hidden potential of model repositories, we chart a preliminary atlas representing the documented fraction of Hugging Face. It provides stunning visualizations of the model landscape and evolution. We demonstrate several applications of this atlas including predicting model attributes (e.g., accuracy), and analyzing trends in computer vision models. However, as the current atlas remains incomplete, we propose a method for charting undocumented regions. Specifically, we identify high-confidence structural priors based on dominant real-world model training practices. Leveraging these priors, our approach enables accurate mapping of previously undocumented areas of the atlas. We publicly release our datasets, code, and interactive atlas.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.87)
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub
Osborne, Cailean, Ding, Jennifer, Kirk, Hannah Rose
Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Furthermore, licenses matter: there are statistically significant differences in collaboration patterns in model repositories with permissive, restrictive, and no licenses. Second, we analyse a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing the isolate developers from the network, collaboration is characterised by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, activity on the HF Hub is characterised by Pareto distributions, congruent with OSS development patterns on platforms like GitHub. We conclude with recommendations for researchers, companies, and policymakers to advance our understanding of open AI development.
- Europe > France (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- (17 more...)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.94)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
What's documented in AI? Systematic Analysis of 32K AI Model Cards
Liang, Weixin, Rajani, Nazneen, Yang, Xinyu, Ozoani, Ezinwanne, Wu, Eric, Chen, Yiqun, Smith, Daniel Scott, Zou, James
The rapid proliferation of AI models has underscored the importance of thorough documentation, as it enables users to understand, trust, and effectively utilize these models in various applications. Although developers are encouraged to produce model cards, it's not clear how much information or what information these cards contain. In this study, we conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face, a leading platform for distributing and deploying AI models. Our investigation sheds light on the prevailing model card documentation practices. Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness. We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out. We analyze the content of each section to characterize practitioners' priorities. Interestingly, there are substantial discussions of data, sometimes with equal or even greater emphasis than the model itself. To evaluate the impact of model cards, we conducted an intervention study by adding detailed model cards to 42 popular models which had no or sparse model cards previously. We find that adding model cards is moderately correlated with an increase weekly download rates. Our study opens up a new perspective for analyzing community norms and practices for model documentation through large-scale data science and linguistics analysis.
- Europe (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Education (0.93)
- Law (0.89)
- (2 more...)
Towards Automatic Support of Software Model Evolution with Large Language~Models
Tinnes, Christof, Fuchß, Thomas, Hohenstein, Uwe, Apel, Sven
Modeling structure and behavior of software systems plays a crucial role, in various areas of software engineering. As with other software engineering artifacts, software models are subject to evolution. Supporting modelers in evolving models by model completion facilities and providing high-level edit operations such as frequently occurring editing patterns is still an open problem. Recently, large language models (i.e., generative neural networks) have garnered significant attention in various research areas, including software engineering. In this paper, we explore the potential of large language models in supporting the evolution of software models in software engineering. We propose an approach that utilizes large language models for model completion and discovering editing patterns in model histories of software systems. Through controlled experiments using simulated model repositories, we conduct an evaluation of the potential of large language models for these two tasks. We have found that large language models are indeed a promising technology for supporting software model evolution, and that it is worth investigating further in the area of software model evolution.
- Europe > Germany > Saarland (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Green Runner: A tool for efficient model selection from model repositories
Kannan, Jai, Barnett, Scott, Simmons, Anj, Selvi, Taylan, Cruz, Luis
Deep learning models have become essential in software engineering, enabling intelligent features like image captioning and document generation. However, their popularity raises concerns about environmental impact and inefficient model selection. This paper introduces GreenRunnerGPT, a novel tool for efficiently selecting deep learning models based on specific use cases. It employs a large language model to suggest weights for quality indicators, optimizing resource utilization. The tool utilizes a multi-armed bandit framework to evaluate models against target datasets, considering tradeoffs. We demonstrate that GreenRunnerGPT is able to identify a model suited to a target use case without wasteful computations that would occur under a brute-force approach to model selection.
- Europe > Netherlands > South Holland > Delft (0.05)
- Oceania > Australia (0.04)
GitFL: Adaptive Asynchronous Federated Learning using Version Control
Hu, Ming, Xia, Zeke, Yue, Zhihao, Xia, Jun, Huang, Yihao, Liu, Yang, Chen, Mingsong
As a promising distributed machine learning paradigm that enables collaborative training without compromising data privacy, Federated Learning (FL) has been increasingly used in AIoT (Artificial Intelligence of Things) design. However, due to the lack of efficient management of straggling devices, existing FL methods greatly suffer from the problems of low inference accuracy and long training time. Things become even worse when taking various uncertain factors (e.g., network delays, performance variances caused by process variation) existing in AIoT scenarios into account. To address this issue, this paper proposes a novel asynchronous FL framework named GitFL, whose implementation is inspired by the famous version control system Git. Unlike traditional FL, the cloud server of GitFL maintains a master model (i.e., the global model) together with a set of branch models indicating the trained local models committed by selected devices, where the master model is updated based on both all the pushed branch models and their version information, and only the branch models after the pull operation are dispatched to devices. By using our proposed Reinforcement Learning (RL)-based device selection mechanism, a pulled branch model with an older version will be more likely to be dispatched to a faster and less frequently selected device for the next round of local training. In this way, GitFL enables both effective control of model staleness and adaptive load balance of versioned models among straggling devices, thus avoiding the performance deterioration. Comprehensive experimental results on well-known models and datasets show that, compared with state-of-the-art asynchronous FL methods, GitFL can achieve up to 2.64X training acceleration and 7.88% inference accuracy improvements in various uncertain scenarios.
Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Megatron
Large language models (LLMs) are some of the most advanced deep learning algorithms that are capable of understanding written language. Many modern LLMs are built using the transformer network introduced by Google in 2017 in the Attention Is All You Need research paper. NVIDIA NeMo Megatron is an end-to-end GPU-accelerated framework for training and deploying transformer-based LLMs up to a trillion parameters. In September 2022, NVIDIA announced that NeMo Megatron is now available in Open Beta, allowing you to train and deploy LLMs using your own data. With this announcement, several pretrained checkpoints have been uploaded to HuggingFace, enabling anyone to deploy LLMs locally using GPUs.
How to Deploy Machine Learning Models to the Cloud Quickly and Easily
Machine learning models are usually developed in a training environment (online or offline) and then can be deployed to be used with live data. If you're working in Data Science and Machine learning projects, knowing how to deploy a model is one of the most important skills you'll need to have. Who is this article for? This article is for those who have created a machine learning model in a local machine and want to deploy and test the model within a short time. It's also for those who are looking for an alternative platform to deploy their machine learning models.
- Information Technology (0.70)
- Media > Film (0.47)
End-to-End Recommender Systems with Merlin: Part 3
At a Glance: So far we have gone through the pre-processing pipeline of the Criteo Dataset using the standard NVTabular toolkit present inside Merlin SDKs. This was a part of the ETL processing and it is present under Part 1. Followed by this, in Part 2, we have gone through the training procedures using the HugeCTR architecture. We have explored 4 standard state-of-the-art architectures using HugeCTR training toolkits inside Merlin. In this section, we are going to explore the inference implementation which is the final deployment process using the Triton Inference Server. Nvidia's Merlin contains 3 crucial components.
Distinguishing artefacts: evaluating the saturation point of convolutional neural networks
Real, Ric, Gopsill, James, Jones, David, Snider, Chris, Hicks, Ben
Prior work has shown Convolutional Neural Networks (CNNs) trained on surrogate Computer Aided Design (CAD) models are able to detect and classify real-world artefacts from photographs. The applications of which support twinning of digital and physical assets in design, including rapid extraction of part geometry from model repositories, information search \& retrieval and identifying components in the field for maintenance, repair, and recording. The performance of CNNs in classification tasks have been shown dependent on training data set size and number of classes. Where prior works have used relatively small surrogate model data sets ($<100$ models), the question remains as to the ability of a CNN to differentiate between models in increasingly large model repositories. This paper presents a method for generating synthetic image data sets from online CAD model repositories, and further investigates the capacity of an off-the-shelf CNN architecture trained on synthetic data to classify models as class size increases. 1,000 CAD models were curated and processed to generate large scale surrogate data sets, featuring model coverage at steps of 10$^{\circ}$, 30$^{\circ}$, 60$^{\circ}$, and 120$^{\circ}$ degrees. The findings demonstrate the capability of computer vision algorithms to classify artefacts in model repositories of up to 200, beyond this point the CNN's performance is observed to deteriorate significantly, limiting its present ability for automated twinning of physical to digital artefacts. Although, a match is more often found in the top-5 results showing potential for information search and retrieval on large repositories of surrogate models.