Huang, Jiajia
AuditWen:An Open-Source Large Language Model for Audit
Huang, Jiajia, Zhu, Haoran, Xu, Chao, Zhan, Tianming, Xie, Qianqian, Huang, Jimin
Intelligent auditing represents a crucial advancement in modern audit practices, enhancing both the quality and efficiency of audits within the realm of artificial intelligence. With the rise of large language model (LLM), there is enormous potential for intelligent models to contribute to audit domain. However, general LLMs applied in audit domain face the challenges of lacking specialized knowledge and the presence of data biases. To overcome these challenges, this study introduces AuditWen, an open-source audit LLM by fine-tuning Qwen with constructing instruction data from audit domain. We first outline the application scenarios for LLMs in the audit and extract requirements that shape the development of LLMs tailored for audit purposes. We then propose an audit LLM, called AuditWen, by fine-tuning Qwen with constructing 30k instruction dataset from 15 audit tasks and 3 layers. In evaluation stage, we proposed a benchmark with 5k instructions that covers a set of critical audit tasks derived from the application scenarios. With the benchmark, we compare AuditWen with other existing LLMs from information extraction, question answering and document generation. The experimental results demonstrate superior performance of AuditWen both in question understanding and answer generation, making it an immediately valuable tool for audit.
FinBen: A Holistic Financial Benchmark for Large Language Models
Xie, Qianqian, Han, Weiguang, Chen, Zhengyu, Xiang, Ruoyu, Zhang, Xiao, He, Yueru, Xiao, Mengxi, Li, Dong, Dai, Yongfu, Feng, Duanyu, Xu, Yijing, Kang, Haoqiang, Kuang, Ziyan, Yuan, Chenhan, Yang, Kailai, Luo, Zheheng, Zhang, Tianlin, Liu, Zhiwei, Xiong, Guojun, Deng, Zhiyang, Jiang, Yuechen, Yao, Zhiyuan, Li, Haohang, Yu, Yangyang, Hu, Gang, Huang, Jiajia, Liu, Xiao-Yang, Lopez-Lira, Alejandro, Wang, Benyou, Lai, Yanzhao, Wang, Hao, Peng, Min, Ananiadou, Sophia, Huang, Jimin
LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs.