A Supplementary Material

Amelia Jimenez Sanchez

Neural Information Processing Systems 

A.1 Data Cards Table 2 shows the extracted documentation parameters from Kaggle and HuggingFace, which we categorized according to Datasheets [40]. On HuggingFace, we find information about the annotation creators (e.g., crowdsource, experts, ml-generated) or specific task categories (e.g., image-classification, image-to-text, text-to-image). Such parameters can be used to filter results when searching on HuggingFace, potentially enabling systematic analysis of a specific task or tag. On Kaggle, we notice that some important parameters shown in the dataset website such as temporal and geospatial coverage, data collection methodology, provenance, DOI citation, and update frequency cannot be automatically extracted with their API, so we manually included them. Kaggle automatically computes a usability score, which is associated with the tag "well-documented", and used for ranking results when searching for a dataset.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found