nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task

Luo, Yuyu, Tang, Jiawei, Li, Guoliang

arXiv.org Artificial Intelligence 

After the release of nvBench in 2021, some deep learning-based models are developed to support translating natural language queries into visualizations. NL2VIS - which translates natural language (NL) queries to corresponding The key factor to making this a success is to acquire enough highquality visualizations (VIS) - has attracted more and more attention (NL, VIS) pairs because deep learning models require the both in commercial visualization vendors and academic availability of large-scale and high-quality training data. In the last few years, the advanced deep learningbased In this paper, we present such a benchmark, namely nvBench [18], models have achieved human-like abilities in many natural that contains 25,750 (NL, VIS) pairs over 750 tables from 105 domains language processing (NLP) tasks, which clearly tells us that the to support the cross-domain NL2VIS task. Different from deep learning-based technique is a good choice to push the field the common practice that builds such a benchmark by manually of NL2VIS. However, a big balk is the lack of benchmarks with designing and collecting enough data and queries, we synthesize lots of (NL, VIS) pairs.