AITopics | Dai, Yongfu

Collaborating Authors

Dai, Yongfu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FinBen: A Holistic Financial Benchmark for Large Language Models

Xie, Qianqian, Han, Weiguang, Chen, Zhengyu, Xiang, Ruoyu, Zhang, Xiao, He, Yueru, Xiao, Mengxi, Li, Dong, Dai, Yongfu, Feng, Duanyu, Xu, Yijing, Kang, Haoqiang, Kuang, Ziyan, Yuan, Chenhan, Yang, Kailai, Luo, Zheheng, Zhang, Tianlin, Liu, Zhiwei, Xiong, Guojun, Deng, Zhiyang, Jiang, Yuechen, Yao, Zhiyuan, Li, Haohang, Yu, Yangyang, Hu, Gang, Huang, Jiajia, Liu, Xiao-Yang, Lopez-Lira, Alejandro, Wang, Benyou, Lai, Yanzhao, Wang, Hao, Peng, Min, Ananiadou, Sophia, Huang, Jimin

arXiv.org Artificial IntelligenceJun-18-2024

LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovation in financial LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.12659

Country:

Asia > China (0.67)
North America > United States > New York (0.14)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Information Technology (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models

Feng, Duanyu, Dai, Yongfu, Huang, Jimin, Zhang, Yifang, Xie, Qianqian, Han, Weiguang, Lopez-Lira, Alejandro, Wang, Hao

arXiv.org Artificial IntelligenceNov-14-2023

In the financial industry, credit scoring is a fundamental element, shaping access to credit and determining the terms of loans for individuals and businesses alike. Traditional credit scoring methods, however, often grapple with challenges such as narrow knowledge scope and isolated evaluation of credit tasks. Our work posits that Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks. To systematically explore LLMs for credit scoring, we propose the first open-source comprehensive framework. We curate a novel benchmark covering 9 datasets with 14K samples, tailored for credit assessment and a critical examination of potential biases within LLMs, and the novel instruction tuning data with over 45k samples. We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks. We evaluate CALM, and existing state-of-art (SOTA) open source and close source LLMs on the build benchmark. Our empirical results illuminate the capability of LLMs to not only match but surpass conventional models, pointing towards a future where credit scoring can be more inclusive, comprehensive, and unbiased. We contribute to the industry's transformation by sharing our pioneering instruction-tuning datasets, credit and risk assessment LLM, and benchmarks with the research community and the financial industry.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2310.00566

Country:

Asia (1.00)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.82)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LAiW: A Chinese Legal Large Language Models Benchmark (A Technical Report)

Dai, Yongfu, Feng, Duanyu, Huang, Jimin, Jia, Haochen, Xie, Qianqian, Zhang, Yifang, Han, Weiguang, Tian, Wei, Wang, Hao

arXiv.org Artificial IntelligenceOct-9-2023

With the emergence of numerous legal LLMs, there is currently a lack of a comprehensive benchmark for evaluating their legal abilities. In this paper, we propose the first Chinese Legal LLMs benchmark based on legal capabilities. Through the collaborative efforts of legal and artificial intelligence experts, we divide the legal capabilities of LLMs into three levels: basic legal NLP capability, basic legal application capability, and complex legal application capability. We have completed the first phase of evaluation, which mainly focuses on the capability of basic legal NLP. The evaluation results show that although some legal LLMs have better performance than their backbones, there is still a gap compared to ChatGPT. Our benchmark can be found at URL.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2310.0562

Country:

Asia > China > Sichuan Province (0.15)
Asia > China > Hubei Province (0.15)

Genre:

Research Report (0.70)
Overview (0.46)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback