Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

Apr-25-2026, 14:22:33 GMT–Neural Information Processing Systems

This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex prompts involving interaction between images, e.g., "What do image A and image B have in common?" To support this interface, pretraining occurs over web corpora that similarly contain interleaved images+text. To date, however, large-scale data of this form have not been publicly available. We release Multimodal C4 (mmc4), an augmentation of the popular text-only c4 corpus2 with images interleaved. We use a linear assignment algorithm to place images into longer bodies of text using CLIP features [24], a process that we show outperforms alternatives.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Apr-25-2026, 14:22:33 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.46)
- North America > United States (0.28)

Industry:
- Health & Medicine (0.67)
- Information Technology > Services (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Communications (0.94)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language (1.00)
    - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
1c6bed78d3813886d3d72595dbecb80b-Paper-Datasets_and_Benchmarks.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found