Wukong: A100MillionLarge-scaleChinese Cross-modalPre-trainingBenchmark
–Neural Information Processing Systems
Their success heavily relies on the scale of pretrained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, which contains 100 million Chinese image-text pairs collected from the web.
Neural Information Processing Systems
Feb-11-2026, 05:56:24 GMT
- Technology: