Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III

Shetty, Pranam, Upadhayaya, Abhisek, Shah, Parth Mitesh, Jagabathula, Srikanth, Nayak, Shilpi, Fee, Anna Joo

Sep-23-2025–arXiv.org Artificial Intelligence

As financial institutions increasingly adopt Large Language Models (LLMs), rigorous domain-specific evaluation becomes critical for responsible deployment. For advanced financial reasoning, the Chartered Financial Analyst (CFA) Level III exam is widely considered the gold standard. In this paper, we present a comprehensive benchmark evaluating 23 state-of-the-art LLMs on mock CFA Level III exams, which require answering challenging multiple choice and essay questions. We evaluate reasoning and non-reasoning models, both proprietary and open source, using three prompting strategies: zero-shot, chain-of-thought, and self-discover. We find that frontier reasoning models, such as o4-mini, Gemini 2.5 Pro, and Claude Opus 4, using chain-of-thought prompting demonstrate strong capabilities, successfully passing the mock Level III exams. While there is little to separate the frontier models on multiple choice questions, only a few models excel at the complex essay questions, which require analysis, synthesis, and strategic thinking. These results demonstrate significant progress in the financial reasoning capabilities of LLMs, which previously [13] could clear Level I and Level II exams but struggled with the Level III exam, particularly the essay questions.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-23-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.66)

Industry:
- Education > Assessment & Standards
  - Student Performance (0.77)
- Banking & Finance
  - Trading (0.68)
  - Financial Services (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)