FinTextQA: A Dataset for Long-form Financial Question Answering
Chen, Jian, Zhou, Peilin, Hua, Yining, Loh, Yingxin, Chen, Kehui, Li, Ziyuan, Zhu, Bing, Liang, Junwei
–arXiv.org Artificial Intelligence
Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pairs extracted and selected from finance textbooks and government agency websites.Moreover, we developed a Retrieval-Augmented Generation (RAG)-based LFQA system, comprising an embedder, retriever, reranker, and generator. A multi-faceted evaluation approach, including human ranking, automatic metrics, and GPT-4 scoring, was employed to benchmark the performance of different LFQA system configurations under heightened noisy conditions. The results indicate that: (1) Among all compared generators, Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy score; (2) The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B, respectively; (3) models are less susceptible to noise after the length of contexts reaching a specific threshold.
arXiv.org Artificial Intelligence
May-16-2024
- Country:
- South America > Colombia (0.04)
- North America > United States
- Virginia (0.04)
- Europe > Slovenia
- Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > China
- Hong Kong (0.04)
- Guangdong Province > Guangzhou (0.04)
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Government (1.00)
- Banking & Finance > Economy (0.68)
- Energy > Power Industry (0.67)
- Law
- Statutes (0.93)
- Business Law (0.93)
- Technology: