MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
–Neural Information Processing Systems
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs).
Neural Information Processing Systems
Oct-10-2025, 00:27:22 GMT
- Country:
- Asia
- Japan > Honshū
- Chūbu > Toyama Prefecture > Toyama (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe > Monaco (0.04)
- North America > United States
- California > Alameda County
- Berkeley (0.04)
- Texas > Travis County
- Austin (0.04)
- California > Alameda County
- Asia
- Industry:
- Government (0.67)
- Information Technology (1.00)
- Law (1.00)
- Technology: