Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
–Neural Information Processing Systems
Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, which contains 100 million Chinese image-text pairs collected from the web. Wukong aims to benchmark different multi-modal pre-training methods to facilitate the VLP research and community development.
artificial intelligence, machine learning, million large-scale chinese cross-modal pre-training, (6 more...)
Neural Information Processing Systems
Jan-18-2025, 11:02:46 GMT
- Technology: