A FineWeb Datasheet Dataset Details Purpose of the dataset
–Neural Information Processing Systems
We released FineWeb to make large language model training more accessible to the machine learning community at large. The dataset was curated by Hugging Face. The dataset was funded by Hugging Face. The dataset is released under the Open Data Commons Attribution License (ODC-By) v1.0 license. The use of this dataset is also subject to Common-Crawl's Terms of Use.
Neural Information Processing Systems
Mar-19-2025, 16:38:48 GMT
- Genre:
- Instructional Material (0.46)
- Industry:
- Education > Educational Setting (0.46)
- Health & Medicine > Consumer Health (0.47)
- Technology: