Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark

Jan-18-2025, 11:02:46 GMT–Neural Information Processing Systems

Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, which contains 100 million Chinese image-text pairs collected from the web. Wukong aims to benchmark different multi-modal pre-training methods to facilitate the VLP research and community development.

chinese cross-modal pre-training benchmark, dataset, million large-scale chinese cross-modal pre-training, (4 more...)

Neural Information Processing Systems

Jan-18-2025, 11:02:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)