AITopics | documentation practice

Collaborating Authors

documentation practice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Completeness of Datasets Documentation on ML/AI repositories: an Empirical Investigation

Rondina, Marco, Vetrò, Antonio, De Martin, Juan Carlos

arXiv.org Artificial IntelligenceFeb-10-2025

ML/AI is the field of computer science and computer engineering that arguably received the most attention and funding over the last decade. Data is the key element of ML/AI, so it is becoming increasingly important to ensure that users are fully aware of the quality of the datasets that they use, and of the process generating them, so that possible negative impacts on downstream effects can be tracked, analysed, and, where possible, mitigated. One of the tools that can be useful in this perspective is dataset documentation. The aim of this work is to investigate the state of dataset documentation practices, measuring the completeness of the documentation of several popular datasets in ML/AI repositories. We created a dataset documentation schema -- the Documentation Test Sheet (DTS) -- that identifies the information that should always be attached to a dataset (to ensure proper dataset choice and informed use), according to relevant studies in the literature. We verified 100 popular datasets from four different repositories with the DTS to investigate which information was present. Overall, we observed a lack of relevant documentation, especially about the context of data collection and data processing, highlighting a paucity of transparency.

dataset, information, repository, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-49008-8_7

2503.13463

Country:

North America > United States (0.15)
Europe > Italy (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Documentation Practices of Artificial Intelligence

Arnold, Stefan, Yesilbas, Dilara, Gröbner, Rene, Riedelbauch, Dominik, Horn, Maik, Weinzierl, Sven

arXiv.org Artificial IntelligenceJun-26-2024

Artificial Intelligence (AI) faces persistent challenges in terms of transparency and accountability, which requires rigorous documentation. Through a literature review on documentation practices, we provide an overview of prevailing trends, persistent issues, and the multifaceted interplay of factors influencing the documentation. Our examination of key characteristics such as scope, target audiences, support for multimodality, and level of automation, highlights a dynamic evolution in documentation practices, underscored by a shift towards a more holistic, engaging, and automated documentation.

ai documentation, documentation, documentation practice, (16 more...)

arXiv.org Artificial Intelligence

2406.1862

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Overview (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.94)

Add feedback

Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face

Yang, Xinyu, Liang, Weixin, Zou, James

arXiv.org Artificial IntelligenceJan-24-2024

Advances in machine learning are closely tied to the creation of datasets. While data documentation is widely recognized as essential to the reliability, reproducibility, and transparency of ML, we lack a systematic empirical understanding of current dataset documentation practices. To shed light on this question, here we take Hugging Face -- one of the largest platforms for sharing and collaborating on ML models and datasets -- as a prominent case study. By analyzing all 7,433 dataset documentation on Hugging Face, our investigation provides an overview of the Hugging Face dataset ecosystem and insights into dataset documentation practices, yielding 5 main findings: (1) The dataset card completion rate shows marked heterogeneity correlated with dataset popularity. (2) A granular examination of each section within the dataset card reveals that the practitioners seem to prioritize Dataset Description and Dataset Structure sections, while the Considerations for Using the Data section receives the lowest proportion of content. (3) By analyzing the subsections within each section and utilizing topic modeling to identify key topics, we uncover what is discussed in each section, and underscore significant themes encompassing both technical and social impacts, as well as limitations within the Considerations for Using the Data section. (4) Our findings also highlight the need for improved accessibility and reproducibility of datasets in the Usage sections. (5) In addition, our human annotation evaluation emphasizes the pivotal role of comprehensive dataset content in shaping individuals' perceptions of a dataset card's overall quality. Overall, our study offers a unique perspective on analyzing dataset documentation through large-scale data science analysis and underlines the need for more thorough dataset documentation in machine learning research.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.13822

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.48)

Industry: Social Sector (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The State of Documentation Practices of Third-party Machine Learning Models and Datasets

Oreamuno, Ernesto Lang, Khan, Rohan Faiyaz, Bangash, Abdul Ali, Stinson, Catherine, Adams, Bram

arXiv.org Artificial IntelligenceDec-22-2023

Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today--Hugging Face (HF). Our findings show that only 21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation. Furthermore, we observe inconsistency in ethics and transparency-related documentation for ML models and datasets.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.15058

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback