FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain

Hu, Tiansheng, Hu, Tongyan, Bai, Liuyang, Zhao, Yilun, Cohan, Arman, Zhao, Chen

Oct-20-2025–arXiv.org Artificial Intelligence

Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of alignment issues based on practical context and features fine-grained tasks for each dimension of trustworthiness evaluation. We assess eleven LLMs on FinTrust and find that proprietary models like o4-mini outperforms in most tasks such as safety while open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness. For challenging task like fiduciary alignment and disclosure, all LLMs fall short, showing a significant gap in legal awareness. We believe that FinTrust can be a valuable benchmark for LLMs' trustworthiness evaluation in finance domain.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-20-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report (1.00)

Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance
  - Trading (1.00)
  - Financial Services (0.68)
  - Credit (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found