OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents Hugo Laurençon ú, 1, 2 Lucile Saulnier ú, 1 Léo T ronchon
–Neural Information Processing Systems
We describe the dataset creation process, present comprehensive filtering rules, and provide an analysis of the dataset's content.
Neural Information Processing Systems
Oct-9-2025, 09:54:56 GMT
- Country:
- Asia
- Japan > Honshū
- Chūbu > Toyama Prefecture > Toyama (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Spain > Valencian Community
- North America
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Text Processing (0.46)
- Vision (1.00)
- Communications (1.00)
- Artificial Intelligence
- Information Technology