Luo, Yuyu
AskChart: Universal Chart Understanding through Textual Enhancement
Yang, Xudong, Wu, Yifan, Zhu, Yizhang, Tang, Nan, Luo, Yuyu
Chart understanding tasks such as ChartQA and Chart-to-Text involve automatically extracting and interpreting key information from charts, enabling users to query or convert visual data into structured formats. State-of-the-art approaches primarily focus on visual cues from chart images, failing to explicitly incorporate rich textual information (e.g., data labels and axis labels) embedded within the charts. This textual information is vital for intuitive human comprehension and interpretation of charts. Moreover, existing models are often large and computationally intensive, limiting their practical applicability. In this paper, we introduce AskChart, a universal model that explicitly integrates both textual and visual cues from charts using a Mixture of Experts (MoE) architecture. AskChart facilitates the learning of enhanced visual-textual representations of charts for effectively handling multiple chart understanding tasks, while maintaining a smaller model size. To capture the synergy between visual and textual modalities, we curate a large-scale dataset named ChartBank with about 7.5M data samples, which helps align textual and visual information and facilitates the extraction of visual entities and text. To effectively train AskChart, we design a three-stage training strategy to align visual and textual modalities for learning robust visual-textual representations and optimizing the learning of the MoE layer. Extensive experiments across five datasets demonstrate the significant performance gains of AskChart in four chart understanding tasks. Remarkably, AskChart with 4.6B parameters outperforms state-of-the-art models with 13B parameters by 68.3% in Open-ended ChartQA and 49.2% in Chart-to-Text tasks, while achieving comparable performance in ChartQA and Chart-to-Table tasks.
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
Liu, Xinyu, Shen, Shuyu, Li, Boyan, Ma, Peixian, Jiang, Runzhi, Zhang, Yuxin, Fan, Ju, Li, Guoliang, Tang, Nan, Luo, Yuyu
Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL, a.k.a., Text-to-SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: NL2SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating NL2SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing NL2SQL errors to find the root cause and guiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for developing NL2SQL solutions. Finally, we discuss the research challenges and open problems of NL2SQL in the LLMs era.
AFlow: Automating Agentic Workflow Generation
Zhang, Jiayi, Xiang, Jinyu, Yu, Zhaoyang, Teng, Fengwei, Chen, Xionghui, Chen, Jiaqi, Zhuge, Mingchen, Cheng, Xin, Hong, Sirui, Wang, Jinlin, Zheng, Bingnan, Liu, Bang, Luo, Yuyu, Wu, Chenglin
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. The code will be available at https://github.com/geekan/MetaGPT. However, the rapid advancement of LLMs heavily relies on manually designed agentic workflows - structured sequences of LLM invocations accompanied by detailed instructions. Designing and refining these workflows requires significant human effort, which limits the scalability and adaptability of LLMs to new, complex domains and hinders their ability to transfer skills across diverse tasks (Tang et al., 2024). Recent efforts have focused on automating the discovery of effective agentic workflows to reduce the reliance on human intervention (Khattab et al., 2024; Yรผksekgรถnรผl et al., 2024; Liu et al., 2023; Hu et al., 2024).
HAIChart: Human and AI Paired Visualization System
Xie, Yupeng, Luo, Yuyu, Li, Guoliang, Tang, Nan
The growing importance of data visualization in business intelligence and data science emphasizes the need for tools that can efficiently generate meaningful visualizations from large datasets. Existing tools fall into two main categories: human-powered tools (e.g., Tableau and PowerBI), which require intensive expert involvement, and AI-powered automated tools (e.g., Draco and Table2Charts), which often fall short of guessing specific user needs. In this paper, we aim to achieve the best of both worlds. Our key idea is to initially auto-generate a set of high-quality visualizations to minimize manual effort, then refine this process iteratively with user feedback to more closely align with their needs. To this end, we present HAIChart, a reinforcement learning-based framework designed to iteratively recommend good visualizations for a given dataset by incorporating user feedback. Specifically, we propose a Monte Carlo Graph Search-based visualization generation algorithm paired with a composite reward function to efficiently explore the visualization space and automatically generate good visualizations. We devise a visualization hints mechanism to actively incorporate user feedback, thus progressively refining the visualization generation module. We further prove that the top-k visualization hints selection problem is NP-hard and design an efficient algorithm. We conduct both quantitative evaluations and user studies, showing that HAIChart significantly outperforms state-of-the-art human-powered tools (21% better at Recall and 1.8 times faster) and AI-powered automatic tools (25.1% and 14.9% better in terms of Hit@3 and R10@30, respectively).
Are Large Language Models Good Statisticians?
Zhu, Yizhang, Du, Shiyin, Li, Boyan, Luo, Yuyu, Tang, Nan
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of scientific tasks including mathematics, physics, and chemistry. Despite their successes, the effectiveness of LLMs in handling complex statistical tasks remains systematically under-explored. To bridge this gap, we introduce StatQA, a new benchmark designed for statistical analysis tasks. StatQA comprises 11,623 examples tailored to evaluate LLMs' proficiency in specialized statistical tasks and their applicability assessment capabilities, particularly for hypothesis testing methods. We systematically experiment with representative LLMs using various prompting strategies and show that even state-of-the-art models such as GPT-4o achieve a best performance of only 64.83%, indicating significant room for improvement. Notably, while open-source LLMs (e.g., LLaMA-3) show limited capability, those fine-tuned ones exhibit marked improvements, outperforming all in-context learning-based methods (e.g., GPT-4o). Moreover, our comparative human experiments highlight a striking contrast in error types between LLMs and humans: LLMs primarily make applicability errors, whereas humans mostly make statistical task confusion errors. This divergence highlights distinct areas of proficiency and deficiency, suggesting that combining LLM and human expertise could lead to complementary strengths, inviting further investigation into their collaborative potential.
Generative AI for Visualization: State of the Art and Future Directions
Ye, Yilin, Hao, Jianing, Hou, Yihan, Wang, Zhan, Xiao, Shishi, Luo, Yuyu, Zeng, Wei
Generative AI (GenAI) has witnessed remarkable progress in recent years and demonstrated impressive performance in various generation tasks in different domains such as computer vision and computational design. Many researchers have attempted to integrate GenAI into visualization framework, leveraging the superior generative capacity for different operations. Concurrently, recent major breakthroughs in GenAI like diffusion model and large language model have also drastically increase the potential of GenAI4VIS. From a technical perspective, this paper looks back on previous visualization studies leveraging GenAI and discusses the challenges and opportunities for future research. Specifically, we cover the applications of different types of GenAI methods including sequence, tabular, spatial and graph generation techniques for different tasks of visualization which we summarize into four major stages: data enhancement, visual mapping generation, stylization and interaction. For each specific visualization sub-task, we illustrate the typical data and concrete GenAI algorithms, aiming to provide in-depth understanding of the state-of-the-art GenAI4VIS techniques and their limitations. Furthermore, based on the survey, we discuss three major aspects of challenges and research opportunities including evaluation, dataset, and the gap between end-to-end GenAI and generative algorithms. By summarizing different generation algorithms, their current applications and limitations, this paper endeavors to provide useful insights for future GenAI4VIS research.
VerifAI: Verified Generative AI
Tang, Nan, Yang, Chenyu, Fan, Ju, Cao, Lei, Luo, Yuyu, Halevy, Alon
Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.
nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task
Luo, Yuyu, Tang, Jiawei, Li, Guoliang
After the release of nvBench in 2021, some deep learning-based models are developed to support translating natural language queries into visualizations. NL2VIS - which translates natural language (NL) queries to corresponding The key factor to making this a success is to acquire enough highquality visualizations (VIS) - has attracted more and more attention (NL, VIS) pairs because deep learning models require the both in commercial visualization vendors and academic availability of large-scale and high-quality training data. In the last few years, the advanced deep learningbased In this paper, we present such a benchmark, namely nvBench [18], models have achieved human-like abilities in many natural that contains 25,750 (NL, VIS) pairs over 750 tables from 105 domains language processing (NLP) tasks, which clearly tells us that the to support the cross-domain NL2VIS task. Different from deep learning-based technique is a good choice to push the field the common practice that builds such a benchmark by manually of NL2VIS. However, a big balk is the lack of benchmarks with designing and collecting enough data and queries, we synthesize lots of (NL, VIS) pairs.