SynChart: Synthesizing Charts from Language Models
Liu, Mengchen, Li, Qixiu, Chen, Dongdong, Chen, Dong, Bao, Jianmin, Li, Yunsheng
–arXiv.org Artificial Intelligence
Since the release of GPT-4V(O), using them to generate pseudo labels for multi-modality tasks has become more and more popular [1] While we often "stand on the shoulders of giants," the process of building the giant itself--specifically, constructing GPT-4V(O) from its foundational large language model (LLM), GPT-4--remains a mystery. In this work, we explore the potential of using LLMs alone to build a competitive multi-modality model. Given budget constraints, we focus on a specific domain--chart understanding--rather than building a general multi-modality model. Since the quantity and quality of data are key determinants of model performance, this work focuses on building a large-scale chart dataset and applying well-established training pipelines. There are two major challenges in constructing such a dataset: first, collecting a diverse set of chart images, and second, the more critical and difficult task of obtaining high-quality labels for these images.
arXiv.org Artificial Intelligence
Sep-24-2024
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment (0.68)
- Media > Television (0.93)
- Technology: