A Methodology for Assessing the Risk of Metric Failure in LLMs Within the Financial Domain

Flanagan, William, Das, Mukunda, Ramanayake, Rajitha, Maslekar, Swanuja, Mangipudi, Meghana, Choi, Joong Ho, Nair, Shruti, Bhusan, Shambhavi, Dulam, Sanjana, Pendharkar, Mouni, Singh, Nidhi, Doshi, Vashisth, Paresh, Sachi Shah

Oct-17-2025–arXiv.org Artificial Intelligence

As Generative Artificial Intelligence is adopted across the financial services industry, a significant barrier to adoption and usage is measuring model performance. Historical machine learning metrics can oftentimes fail to generalize to GenAI workloads and are often supplemented using Subject Matter Expert (SME) Evaluation. Even in this combination, many projects fail to account for various unique risks present in choosing specific metrics. Additionally, many widespread benchmarks created by foundational research labs and educational institutions fail to generalize to industrial use. This paper explains these challenges and provides a Risk Assessment Framework to allow for better application of SME and machine learning Metrics

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report (0.84)

Industry:
- Law (1.00)
- Health & Medicine (1.00)
- Banking & Finance > Insurance (0.71)
- Government > Regional Government
  - North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.87)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found