A FineWeb Datasheet Dataset Details Purpose of the dataset

Neural Information Processing Systems 

We released FineWeb to make large language model training more accessible to the machine learning community at large. The dataset was curated by Hugging Face. The dataset was funded by Hugging Face. The dataset is released under the Open Data Commons Attribution License (ODC-By) v1.0 license. The use of this dataset is also subject to Common-Crawl's Terms of Use.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found