Croissant: A Metadata Format for ML-Ready Datasets

Akhtar, Mubashara, Benjelloun, Omar, Conforti, Costanza, Gijsbers, Pieter, Giner-Miguelez, Joan, Jain, Nitisha, Kuchnik, Michael, Lhoest, Quentin, Marcenac, Pierre, Maskey, Manil, Mattson, Peter, Oala, Luis, Ruyssen, Pierre, Shinde, Rajat, Simperl, Elena, Thomas, Goeffry, Tykhonov, Slava, Vanschoren, Joaquin, van der Velde, Jos, Vogler, Steffen, Wu, Carole-Jean

arXiv.org Artificial Intelligence 

Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found